ruffus swallows the return code if the process of handling an exception
we hit an error in ruffus' own code, which can happen. So pick through
its error stack and find out if there's an interesting return code in
there. Had to use eval() of all things.
Also suppress the stack trace for normal error conditions that don't
need one.
Some versions of tesseract installed by homebrew end up without a
functional tessdata folder, and tesseract is not helpful in this
situation, so add a new test to make sure our output is at least
indicative of the problem.
In the process of properly handling return codes I discovered
test_override_metadata triggers a NPE inside JHOVE probably due to the
Unicode character checking. This could be specific to my JRE (1.6.0_65,
Oracle) but it's probably JHOVE's fault. A valid PDF/A (per Acrobat)
is still generated.
Modified pipeline to fix regression and return the proper error code if
we did not produce a PDF/A as expected. The wrapper forces the output
to be PDF 1.3 which is not PDF/A compliant.
The funny thing is that in some cases JHOVE incorrectly states that a
file is PDF/A-1b compliant, well formed and valid, even when it is not
according to Acrobat XI and is missing the PDF/A metadata marker, as
far as I can tell. JHOVE may not be as beneficial as hoped.
It revealed a regression - return code not the same as v2.x for invalid
PDF/A. It's also not easy to get the return code out of ruffus. Will
need to tweak the final step of the pipeline.
Although the real issue was that the ruffus pipeline cannot be executed
twice in the same process due to its reliance on global variables.
The new OO pipeline in ruffus 2.6 would be one resolution that would
allow for more comprehensive testing as opposed to farming out the
execution to subprocess and inspecting the results, as is currently
done.
Specifically it trips over the need to reimport ocrmypdf.main. That in
turn raises questions about whether to make that function into an
external script that imports ocrmypdf... or something else. Would be
possible with a loop that manipulates sys_argv and then reloads
ocrmypdf.main; might need that anyway.