258 Commits

Author SHA1 Message Date
James R. Barlow
7baaf00a38 Fix wrong return code tested 2018-07-05 13:49:22 -07:00
James R. Barlow
47885f4230 Remove initial qpdf.repair
Since pikepdf is doing the work the initial repair takes time and gives
little benefit.

It turns out to not be worthwhile to
save the results of PdfInfo parsing,
since the time to save this seems to exceed the costs of recalculating
it since the "weave" code. At least
for small files.
2018-07-03 16:50:05 -07:00
James R. Barlow
39c44bdd2f Don't use --optimize in test since jbig2enc is not always installed 2018-07-02 23:48:23 -07:00
James R. Barlow
2974929b26 Make jpeg/png quality tunable args 2018-07-02 22:22:59 -07:00
James R. Barlow
7200623007 Fix installation for Python 3.7
Need to use private fork of ruffus for Python 3.7. Backward compatible with Python 3.6 for ruffus 2.6.3

Disable locale checking for 3.7 since the various fixes in that release should make it unnecessary.
2018-07-02 16:47:14 -07:00
James R. Barlow
02b3ca6862 Compress test images more heavily 2018-06-28 21:40:12 -07:00
James R. Barlow
bc90f40a8f Replace all Pix.read with Pix.open 2018-06-28 15:13:26 -07:00
James R. Barlow
faad1fc58a Reactivate two tests that weren't using their fixtures properly 2018-06-23 01:54:09 -07:00
James R. Barlow
b0dbaeafc5 Cleanup unused imports 2018-06-23 01:47:53 -07:00
James R. Barlow
78a686ecb4 Consider qpdf behavior on algo4 a pass
qpdf opens files with null user password, so do the same.
2018-05-25 00:33:31 -07:00
James R. Barlow
68d8642988 Found out this test was extremely slow - no reason to actual use a large file 2018-05-24 22:22:51 -07:00
James R. Barlow
16f70ff054 Main changeset for pikepdf-based refactor pdfinfo 2018-05-24 22:22:01 -07:00
James R. Barlow
ca297fd26b Update tests 2018-05-11 02:33:44 -07:00
James R. Barlow
72253d09fa Add arguments to control optimization 2018-05-10 22:23:24 -07:00
James R. Barlow
24b0adfacc Merge branch 'master' into develop 2018-05-10 20:54:55 -07:00
James R. Barlow
606d3e6aa1 Remove tests that exercise obsolete features (tesseract, -g) 2018-05-10 20:33:32 -07:00
James R. Barlow
687a7954d6 test_main: uses leptonica 2018-05-10 19:05:31 -07:00
James R. Barlow
b8f3ead541 Remove tesseract renderer entirely
Grafting lets us work with older Tesseract versions as if they could use
sandwich, so there is no point in keeping it. It's been deprecated for a
long time now anyway.
2018-05-10 14:06:13 -07:00
James R. Barlow
9226f8a5d1 Trap PDF/A-3 errors on old Ghostscript 2018-05-04 15:29:43 -07:00
James R. Barlow
7cf83c77ca Merge branch 'feature/pdfa3' 2018-05-03 16:45:57 -07:00
James R. Barlow
76276f61e5 Split out rotation related tests 2018-05-01 23:51:35 -07:00
James R. Barlow
bfd26e6ec6 Tests: confirm OCR layer copied 2018-05-01 23:16:41 -07:00
James R. Barlow
b5d7e9cbb0 Fix all issues with rotations
All tests now pass
2018-05-01 22:50:20 -07:00
James R. Barlow
a9abe13185 Remove the old tesseract pdf_renderer 2018-05-01 17:31:34 -07:00
James R. Barlow
0934905493 Don't suppress error message from config_notfound
Since it showed up in s390x bionic
2018-04-25 21:58:18 -07:00
James R. Barlow
df87e21c85 Add support for PDF/A-3
No ability to attach files however
2018-04-20 00:06:55 -07:00
Hugo
d761d80750 Use more standard __version__ rather than PILLOW_VERSION (#257) 2018-04-19 23:35:32 -07:00
James R. Barlow
0b10db91be Fix regression: Disable Ghostscript JPEG passthrough entirely 2018-04-17 17:00:24 -07:00
James R. Barlow
10aa59f674 v6.1.4 fix test suite regression with Ghostscript 9.23 2018-04-12 15:16:54 -07:00
James R. Barlow
40ef4f0bbe Add new argument --skip-repair to skip the repair step 2018-03-28 00:54:58 -07:00
James R. Barlow
530eae3898 Fix test_main missing file_claims_pdfa 2018-03-26 15:33:53 -07:00
James R. Barlow
bc56b8e058 Move metadata tests to new test_metadata 2018-03-26 01:49:25 -07:00
James R. Barlow
874ec6a87f Add missing fixture to test_unpaper 2018-03-24 22:24:14 -07:00
James R. Barlow
c138161fae Tests: more cleanup 2018-03-24 15:35:57 -07:00
James R. Barlow
e48590d66c Refactor out unpaper-specific tests 2018-03-24 15:21:44 -07:00
James R. Barlow
5b1c8541fc Review some skipped tests to make sure reasons still valid 2018-03-24 15:13:23 -07:00
James R. Barlow
e5e011021b Remove the OCRMYPDF_program environment variables
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:09:08 -07:00
James R. Barlow
11d74dea09 Remove the OCRMYPDF_program environment variables
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:07:02 -07:00
James R. Barlow
6756016572 Add license notice to all files
Source files to GPL3

Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py

Test resources to CC BY-SA 4.0 except when otherwise noted.

Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
d700154e0e Fix regressions after --skip-text improvements 2018-03-24 02:24:45 -07:00
James R. Barlow
8159cc6b88 Skip one test that fails for qpdf 8.0.[0,1], due to qpdf regression 2018-03-09 07:57:22 -08:00
James R. Barlow
4046766ca5 Fix Python 3.5 test suite failure on symlinks
Did not account for API difference in pathlib
2018-03-02 16:57:46 -08:00
James R. Barlow
74ca736333 Issue #223: improve text of encrypted PDF error message 2018-02-27 15:08:22 -08:00
James R. Barlow
e7bcb95635 Fix pylint errors 2018-02-24 11:59:01 -08:00
James R. Barlow
3de83627a9 Handle output to /dev/null or directory (#219)
Previously we threw an exception if the output name was a directory (only after doing OCR) and would trigger a PermissionError on trying to flip permission bits of /dev/null due to shutil.copyfile implementation. Instead of copying file use shutil.copyfileobj which should also respect umask etc.
2018-02-19 22:15:07 -08:00
James R. Barlow
a9da839c39 Add vector-only PDF test case 2018-02-08 00:17:35 -08:00
James R. Barlow
1dfc32d7e6 Preserve "text as curves" vector content
Never updated the checking logic to deal with a pure vector file with no text that needs an OCR layer. This is doable, so allow it.
2018-02-07 16:05:48 -08:00
James R. Barlow
ad7a4476db hugemono.pdf needs --max-image-mpixels to pass with Pillow 5.0 2018-01-10 16:55:18 -08:00
James R. Barlow
4812b20fb2 Fix tesseract_noop.py generating wrong size of output PDF in tests
This caused trouble before with test_deskew
2018-01-10 16:35:31 -08:00
James R. Barlow
882fc2257c Add --max-image-mpixels argument to support Pillow 5.0 2018-01-10 15:43:59 -08:00