Ghostscript 9.21 does not seem to accept Unicode above U+FFFF. Previous
versions did, but it now exits with a rangecheck error (-15).
Reject on the command line for now. Complete fix would also need to
check input PDF’s metadata.
Because ruffus doesn’t handle exceptions well I tended to call sys.exit
to make sure we got out of dodge when needed. However, sys.exit is not
ideal for the Python API this is moving towards, so this introduces
proper exceptions for the various cases that retain suggested error
codes. Only __main__.py should call sys.exit now, everyone else has to
throw an exception.
For now the worker raising a fatal exception is logging messages rather
than passing an exception object with the fatal error message, mainly
because ruffus doesn’t properly marshall the exception object so we
just check “what is the name of the exception class that caused ruffus
to thrown an RethrownJobError”?
Also fixed along the way was the wrong return code being shown for
encrypted PDF checking, and incorrect use of str.find (e.output.find)
in boolean logic (str.find returns -1 on failure to find, which is True).
Turns out this occurred in any case where pdf-renderer hocr was used
and a tesseract timeout or error occurred. We created a replacement
page based on the unrotated page dimensions instead of the input image’s
dimensions.
Executing a package with python -m packagename will check for
__main__.py inside the package. In other words main.py should have
always been named __main__.py.
In the unlikely event that someone depends on "import ocrmypdf.main"
being meaningful, main.py continues to exist and replicates the
behavior of __main__. (It's unlikely because import ocrmypdf.main does
unpythonic ruffus-related things at things import time, essentially
configuring itself to work with sys.argv. To fix another day.)
This should solve the problem of Debian needing to run test suites
before installation and afterwards for continuous integration without
having to patch either file, as python -m ocrmypdf will follow import
order. That is, if the current directory contains "ocrmypdf/" (e.g.
staging a new version) then that will be tested, else sys.path will
be checked.