OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2025-07-15 04:51:25 +00:00

Author	SHA1	Message	Date
James R. Barlow	9dad40b5a3	Major overhaul of the Dockerfile Switched from Ubuntu to debian:stretch because stretch has more recent versions of our binary packages and starts smaller. In particular, stretch has both pillow==2.9.0 and reportlab==3.2.0 available as system packages which saves the considerable hassle of install a toolchain. Instead, a pyvenv is set up with access to system's site-packages (note: needs two steps), making the binary-dependent packages available. Then the remaining packages are installed into the pyvenv with --no-cache-dir to avoid saving files. And there we are. Image is still very large (>500 MB), but programs like reportlab require font rendering capabilities so they pull in large portions of the Linux graphics stack. Not much will shrink that.	2015-08-20 01:25:31 -07:00
James R. Barlow	630e6cbf1e	pip chokes on Unicode filenames?	2015-08-18 23:56:30 -07:00
James R. Barlow	cc161780df	Replace fileinput with regular open-replace fileinput is supposed to save time in these cases but it's not capable of doing both in-place rewrites and working with a non-ascii encoding. This was not noticed until characters outside of ASCII were picked up by tesseract and saved in a HOCR file. Rework some surrounding code as well and add multilingual test cases.	2015-08-18 23:27:50 -07:00
James R. Barlow	0ec13d3a17	Fix test cases: minor issues -os.environ directly modified when whole suite run, breaking subsequent tests -no longer trusting JHOVE for PDF/A validation	2015-08-16 01:57:35 -07:00
James R. Barlow	85af0f0d03	Add test case for blank PDF page	2015-08-14 00:46:50 -07:00
James R. Barlow	9247ea00bf	Improve ruffus exception handling ruffus swallows the return code if the process of handling an exception we hit an error in ruffus' own code, which can happen. So pick through its error stack and find out if there's an interesting return code in there. Had to use eval() of all things. Also suppress the stack trace for normal error conditions that don't need one.	2015-08-11 02:19:46 -07:00
James R. Barlow	a1238d7bf9	Document override binary test	2015-08-11 00:44:43 -07:00
James R. Barlow	2d63268f0f	Work around JHOVE bug for now, so that the test passes	2015-08-11 00:23:48 -07:00
James R. Barlow	1cb5f6a90d	Refactor exit codes; test for missing tessdata Some versions of tesseract installed by homebrew end up without a functional tessdata folder, and tesseract is not helpful in this situation, so add a new test to make sure our output is at least indicative of the problem. In the process of properly handling return codes I discovered test_override_metadata triggers a NPE inside JHOVE probably due to the Unicode character checking. This could be specific to my JRE (1.6.0_65, Oracle) but it's probably JHOVE's fault. A valid PDF/A (per Acrobat) is still generated.	2015-08-11 00:17:02 -07:00
James R. Barlow	8fe54d1a5c	Add new test case to check invalid PDF/A case It revealed a regression - return code not the same as v2.x for invalid PDF/A. It's also not easy to get the return code out of ruffus. Will need to tweak the final step of the pipeline.	2015-08-10 13:57:28 -07:00
James R. Barlow	97015ef775	Add a test case to check on the @argumentsfile syntax	2015-08-05 23:17:38 -07:00
James R. Barlow	2744dafb74	New test case: ensure metadata is preserved from input to output	2015-08-05 17:09:38 -07:00
James R. Barlow	7b268dbe1a	Remove duplication in test case	2015-08-05 16:57:04 -07:00
James R. Barlow	6a160d22fe	Update release notes, add copyrights	2015-07-28 04:36:58 -07:00
James R. Barlow	e35526192c	More test cases	2015-07-28 03:02:35 -07:00
James R. Barlow	bea57bdded	More test cases for other parameters	2015-07-28 02:31:18 -07:00
James R. Barlow	a3f37de9b5	Test cases for --tesseract-timeout	2015-07-28 01:47:30 -07:00
James R. Barlow	8508141314	Drop nose, all tests working reasonably again Although the real issue was that the ruffus pipeline cannot be executed twice in the same process due to its reliance on global variables. The new OO pipeline in ruffus 2.6 would be one resolution that would allow for more comprehensive testing as opposed to farming out the execution to subprocess and inspecting the results, as is currently done.	2015-07-28 00:43:22 -07:00
James R. Barlow	1c95597882	nose can't really handle external tests so looking into py.test instead Specifically it trips over the need to reimport ocrmypdf.main. That in turn raises questions about whether to make that function into an external script that imports ocrmypdf... or something else. Would be possible with a loop that manipulates sys_argv and then reloads ocrmypdf.main; might need that anyway.	2015-07-27 22:07:04 -07:00
James R. Barlow	b40eec4cb0	Add --oversample test for hocr rendering	2015-07-27 17:18:02 -07:00
James R. Barlow	7bcd48c269	Add test to confirm that metadata is transferred to final PDF/A	2015-07-27 16:11:51 -07:00
James R. Barlow	2e7cd52c0f	Improve argument handling, test cases	2015-07-27 15:39:54 -07:00
Jim Barlow	0c5c208db0	Goodbye, so long, farewell, shell...	2015-07-25 00:57:07 -07:00
Jim Barlow	b2168e11db	Require Py3 for tests	2015-07-22 11:21:33 -07:00
Jim Barlow	6d5d8be708	New test: check skew	2015-07-22 04:00:59 -07:00
Jim Barlow	ce2dbdf372	Add another test	2015-07-22 03:16:19 -07:00
Jim Barlow	ec8a35a7a6	Basic test cases	2015-07-22 02:59:25 -07:00

1 2 3 4

177 Commits