177 Commits

Author SHA1 Message Date
James R. Barlow
9dad40b5a3 Major overhaul of the Dockerfile
Switched from Ubuntu to debian:stretch because stretch has more recent
versions of our binary packages and starts smaller.  In particular,
stretch has both pillow==2.9.0 and reportlab==3.2.0 available as system
packages which saves the considerable hassle of install a toolchain.

Instead, a pyvenv is set up with access to system's site-packages (note:
needs two steps), making the binary-dependent packages available.  Then
the remaining packages are installed into the pyvenv with --no-cache-dir
to avoid saving files. And there we are.

Image is still very large (>500 MB), but programs like reportlab require
font rendering capabilities so they pull in large portions of the Linux
graphics stack. Not much will shrink that.
2015-08-20 01:25:31 -07:00
James R. Barlow
630e6cbf1e pip chokes on Unicode filenames? 2015-08-18 23:56:30 -07:00
James R. Barlow
cc161780df Replace fileinput with regular open-replace
fileinput is supposed to save time in these cases but it's not capable
of doing both in-place rewrites and working with a non-ascii encoding.
This was not noticed until characters outside of ASCII were picked up
by tesseract and saved in a HOCR file. Rework some surrounding code as
well and add multilingual test cases.
2015-08-18 23:27:50 -07:00
James R. Barlow
0ec13d3a17 Fix test cases: minor issues
-os.environ directly modified when whole suite run, breaking subsequent
tests
-no longer trusting JHOVE for PDF/A validation
2015-08-16 01:57:35 -07:00
James R. Barlow
85af0f0d03 Add test case for blank PDF page 2015-08-14 00:46:50 -07:00
James R. Barlow
9247ea00bf Improve ruffus exception handling
ruffus swallows the return code if the process of handling an exception
we hit an error in ruffus' own code, which can happen.  So pick through
its error stack and find out if there's an interesting return code in
there.  Had to use eval() of all things.

Also suppress the stack trace for normal error conditions that don't
need one.
2015-08-11 02:19:46 -07:00
James R. Barlow
a1238d7bf9 Document override binary test 2015-08-11 00:44:43 -07:00
James R. Barlow
2d63268f0f Work around JHOVE bug for now, so that the test passes 2015-08-11 00:23:48 -07:00
James R. Barlow
1cb5f6a90d Refactor exit codes; test for missing tessdata
Some versions of tesseract installed by homebrew end up without a
functional tessdata folder, and tesseract is not helpful in this
situation, so add a new test to make sure our output is at least
indicative of the problem.

In the process of properly handling return codes I discovered
test_override_metadata triggers a NPE inside JHOVE probably due to the
Unicode character checking.  This could be specific to my JRE (1.6.0_65,
Oracle) but it's probably JHOVE's fault.  A valid PDF/A (per Acrobat)
is still generated.
2015-08-11 00:17:02 -07:00
James R. Barlow
8fe54d1a5c Add new test case to check invalid PDF/A case
It revealed a regression - return code not the same as v2.x for invalid
PDF/A.  It's also not easy to get the return code out of ruffus.  Will
need to tweak the final step of the pipeline.
2015-08-10 13:57:28 -07:00
James R. Barlow
97015ef775 Add a test case to check on the @argumentsfile syntax 2015-08-05 23:17:38 -07:00
James R. Barlow
2744dafb74 New test case: ensure metadata is preserved from input to output 2015-08-05 17:09:38 -07:00
James R. Barlow
7b268dbe1a Remove duplication in test case 2015-08-05 16:57:04 -07:00
James R. Barlow
6a160d22fe Update release notes, add copyrights 2015-07-28 04:36:58 -07:00
James R. Barlow
e35526192c More test cases 2015-07-28 03:02:35 -07:00
James R. Barlow
bea57bdded More test cases for other parameters 2015-07-28 02:31:18 -07:00
James R. Barlow
a3f37de9b5 Test cases for --tesseract-timeout 2015-07-28 01:47:30 -07:00
James R. Barlow
8508141314 Drop nose, all tests working reasonably again
Although the real issue was that the ruffus pipeline cannot be executed
twice in the same process due to its reliance on global variables.

The new OO pipeline in ruffus 2.6 would be one resolution that would
allow for more comprehensive testing as opposed to farming out the
execution to subprocess and inspecting the results, as is currently
done.
2015-07-28 00:43:22 -07:00
James R. Barlow
1c95597882 nose can't really handle external tests so looking into py.test instead
Specifically it trips over the need to reimport ocrmypdf.main.  That in
turn raises questions about whether to make that function into an
external script that imports ocrmypdf... or something else.  Would be
possible with a loop that manipulates sys_argv and then reloads
ocrmypdf.main; might need that anyway.
2015-07-27 22:07:04 -07:00
James R. Barlow
b40eec4cb0 Add --oversample test for hocr rendering 2015-07-27 17:18:02 -07:00
James R. Barlow
7bcd48c269 Add test to confirm that metadata is transferred to final PDF/A 2015-07-27 16:11:51 -07:00
James R. Barlow
2e7cd52c0f Improve argument handling, test cases 2015-07-27 15:39:54 -07:00
Jim Barlow
0c5c208db0 Goodbye, so long, farewell, shell... 2015-07-25 00:57:07 -07:00
Jim Barlow
b2168e11db Require Py3 for tests 2015-07-22 11:21:33 -07:00
Jim Barlow
6d5d8be708 New test: check skew 2015-07-22 04:00:59 -07:00
Jim Barlow
ce2dbdf372 Add another test 2015-07-22 03:16:19 -07:00
Jim Barlow
ec8a35a7a6 Basic test cases 2015-07-22 02:59:25 -07:00