16 Commits

Author SHA1 Message Date
James R. Barlow
6e27ecd2b9 Finalize ‘exec’ migration and make it backward compatibility for now 2017-01-18 17:40:50 -08:00
James R. Barlow
d33a50660d Replace most sys.exit() with raising exceptions
Because ruffus doesn’t handle exceptions well I tended to call sys.exit
to make sure we got out of dodge when needed.  However, sys.exit is not
ideal for the Python API this is moving towards, so this introduces
proper exceptions for the various cases that retain suggested error
codes. Only __main__.py should call sys.exit now, everyone else has to
throw an exception.

For now the worker raising a fatal exception is logging messages rather
than passing an exception object with the fatal error message, mainly
because ruffus doesn’t properly marshall the exception object so we
just check “what is the name of the exception class that caused ruffus
to thrown an RethrownJobError”?

Also fixed along the way was the wrong return code being shown for
encrypted PDF checking, and incorrect use of str.find (e.output.find)
in boolean logic (str.find returns -1 on failure to find, which is True).
2016-12-10 15:24:24 -08:00
James R. Barlow
4ee9658e97 Move external program wrappers to ocrmypdf.exe package 2016-12-09 16:54:24 -08:00
James R. Barlow
dd1b84e7ba More refactoring - helpers.py 2016-12-09 16:31:08 -08:00
James R. Barlow
4c677e6c47 Extract pipeline out of __main__.py and into pipeline.py
This leaves __main__.py to handle command line arguments while pipeline.py
runs the pipeline - mostly. They are still somewhat intertwined, with
__main__.py doing essential things for pipeline.py, etc., and some
helper functions that could go in their own module.

All tests pass after this major refactor.
2016-12-09 16:17:12 -08:00
James R. Barlow
0a0ceda71f Start the documentation 2016-09-06 13:52:40 -07:00
James R. Barlow
12575d594a Improve PDF/A validity checking at end 2016-08-03 01:26:16 -07:00
James R. Barlow
2414b79ee6 More cleanup of exception related errors 2016-07-31 01:48:13 -07:00
James R. Barlow
f385772d21 Refactor "is this an iterable that's not a string?" test 2016-07-29 15:25:02 -07:00
James R. Barlow
bbd02926e1 Add helpful error message for PDFs that use algorithm 4 2016-06-23 13:13:17 -07:00
James R. Barlow
f3b0434a87 Improve ability to capture error messages from tesseract on a crash 2016-02-19 03:48:49 -08:00
James R. Barlow
6a7ed7d359 Make logging output a lot more useful 2016-02-08 00:58:14 -08:00
James R. Barlow
1731ce2a44 Environment variables can now override default programs 2015-12-17 09:05:10 -08:00
James R. Barlow
2d63268f0f Work around JHOVE bug for now, so that the test passes 2015-08-11 00:23:48 -07:00
James R. Barlow
1cb5f6a90d Refactor exit codes; test for missing tessdata
Some versions of tesseract installed by homebrew end up without a
functional tessdata folder, and tesseract is not helpful in this
situation, so add a new test to make sure our output is at least
indicative of the problem.

In the process of properly handling return codes I discovered
test_override_metadata triggers a NPE inside JHOVE probably due to the
Unicode character checking.  This could be specific to my JRE (1.6.0_65,
Oracle) but it's probably JHOVE's fault.  A valid PDF/A (per Acrobat)
is still generated.
2015-08-11 00:17:02 -07:00
Jim Barlow
9adb0d696f Prepare for Python packaging - move to ocrmypdf folder 2015-07-25 18:22:04 -07:00