156 Commits

Author SHA1 Message Date
James R. Barlow
a7b307af04 Looks like issue was negzero.pdf with qpdf 5.1.1 on travis, which is why osx passes
Reorganize and see if this is better now
2017-11-29 12:47:09 -08:00
James R. Barlow
731c9ea55e Set timeouts on the tests that seem to be stalling on travis (but not elsewhere) 2017-11-27 14:46:10 -08:00
James R. Barlow
92ca9e954c Fix test warning/failures, hopefully 2017-11-27 13:41:32 -08:00
James R. Barlow
56614fcaa4 Add support and tests for handling page count > ulimit - fixes issue #181 2017-11-27 00:32:35 -08:00
James R. Barlow
4d9169e15f Add merge ulimit test case 2017-11-26 23:34:36 -08:00
James R. Barlow
965de3a235 Test case for issue #200 2017-11-26 22:52:53 -08:00
James R. Barlow
7bbf6bc7f4 Travis didn't like LANG, use LC_ALL 2017-11-16 20:37:30 -08:00
James R. Barlow
40aa82ab41 Check that the locale is sane before allowing OCR to proceed 2017-11-16 17:18:02 -08:00
James R. Barlow
c7b8b6e18b Fix issue #194 - --sidecar creates blank txt file 2017-10-26 18:15:31 -07:00
James R. Barlow
4b7135f0e5 Add option to produce PDF/A-1B 2017-10-11 14:32:58 -07:00
James R. Barlow
952f0cca15 Dockerfiles: set LANG=C.UTF-8
Issue #184 to avoid issue with printing UTF-8 text to sidecar
2017-08-30 13:25:54 -07:00
James R. Barlow
b3097a2384 Fix broken test case related to language packs 2017-08-24 13:01:02 -07:00
James R. Barlow
f7ce8f44e9 Weaken the --user-words test so it will pass on Travis 2017-07-26 21:03:51 -07:00
James R. Barlow
52483072dc Add a differential test that checks tesseract uses supplied word list 2017-07-21 16:40:20 -07:00
James R. Barlow
7f0b8621f3 Tests: accept rich path objects without having to str() everything 2017-07-21 16:39:22 -07:00
James R. Barlow
cd8db60b06 Crash test all renderers, not just two 2017-07-21 14:10:02 -07:00
James R. Barlow
1aa34f5d2e Make some interfaces accepting of both str-paths and Path objects 2017-07-21 13:28:30 -07:00
James R. Barlow
d792ef7222 Give the ‘auto’ renderer setting more test covfefe 2017-06-13 13:13:58 -07:00
James R. Barlow
2c24f67deb Rename “tess4” renderer to “sandwich” and make it default in Tess 3.05.01
Tesseract 3.05.01 backported the textonly_pdf=1 which allows the use
of this superior PDF renderer prior to 4.00 alpha. This means that
the tess4 name is no longer accurate, so call it a sandwich because of
its merge-preserve characteristic. Preserve the tess4 name. Fix the
documentation and tests to reflect this.

Make it the default, because it’s better. It does not have the issues
the “tesseract” renderer does prior to Tess 3.05.00 with rendering
PDFs that Ghostscript corrupts, and it produces better output without
re-rastering.

Deprecate some old stuff to avoid the test suite growing obscenely
large.
2017-06-13 13:09:12 -07:00
James R. Barlow
28341b755f Refactor common test fixtures 2017-05-29 12:47:55 -07:00
James R. Barlow
08e47117a3 Rename pageinfo to pdfinfo 2017-05-19 15:48:23 -07:00
James R. Barlow
8694f8d2eb Replace magic strings colorspace and encoding with Enums 2017-05-18 22:32:27 -07:00
James R. Barlow
56d2aae963 Refactor from ImageInfo index to attribute accessing 2017-05-18 18:39:14 -07:00
James R. Barlow
caee5b1428 Access PageInfo instance variables instead of dictionary 2017-05-18 17:12:04 -07:00
James R. Barlow
cd04ae6949 Refactor PdfInfo(str(filename)) -> PdfInfo(filename) 2017-05-18 16:43:50 -07:00
James R. Barlow
6a0b68298f Refactor pdf_get_all_pageinfo to PdfInfo 2017-05-18 16:31:18 -07:00
James R. Barlow
e1e9135e93 Test suite: tidy up imports 2017-05-14 23:15:29 -07:00
James R. Barlow
96045e98f4 Update develop with master changes
We’re well out of the “trivial updates” zone
2017-05-11 22:54:27 -07:00
James R. Barlow
01b7205e2c Ensure skipped pages are explained in sidecars 2017-05-11 00:43:36 -07:00
James R. Barlow
183eafa587 Implement sidecar text files (#126) 2017-05-10 15:22:44 -07:00
James R. Barlow
01a1c2b576 Implement —pdfa-image-compression to control Ghostscript’s compression
Fixes #163
2017-05-09 16:37:29 -07:00
James R. Barlow
c97ea1f2a9 Update high DPI test case to confirm the output image is not downsampled 2017-05-06 22:34:01 -07:00
James R. Barlow
93e802f473 Fix issue #163, color and grayscale images JPEG compressed when not needed 2017-05-06 22:27:25 -07:00
James R. Barlow
aa859a4139 Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents 2017-05-01 15:46:15 -07:00
James R. Barlow
b9b12e2879 Ensure that ocrmypdf stops and reports an error if Ghostscript fails
Past behavior was to continue and let ruffus puke eventually
2017-05-01 15:44:21 -07:00
James R. Barlow
554fcc8b9d Add test case for #152 2017-04-18 15:20:25 -07:00
James R. Barlow
89599b4812 Drop Python 3.4 compatibility 2017-03-29 15:46:53 -07:00
James R. Barlow
88ef2718f1 Reject high Unicode metadata at command line
Ghostscript 9.21 does not seem to accept Unicode above U+FFFF. Previous
versions did, but it now exits with a rangecheck error (-15).

Reject on the command line for now. Complete fix would also need to
check input PDF’s metadata.
2017-03-28 11:08:38 -07:00
James R. Barlow
e71e8ca3ad Workaround for GS VMerror -25 bug
Avoid inserting docinfo keys that would be translated to null strings,
to avoid running afoul of
https://bugs.ghostscript.com/show_bug.cgi?id=697684
2017-03-28 11:05:43 -07:00
James R. Barlow
199de96cff Ghostcript 9.21 seems to have a regression related to Unicode metadata 2017-03-24 15:15:46 -07:00
James R. Barlow
8ddbe81513 Fix issue #147: unpaper loses DPI information, affects —pdf-renderer tess4 2017-03-24 13:23:03 -07:00
James R. Barlow
f035cb1088 Fixed issue #142 — closed streams raise an exception on fork attempt 2017-03-13 15:52:57 -07:00
James R. Barlow
72660d0dec MacOS skip the one test that needs poppler, to save installing poppler 2017-03-11 17:03:26 -08:00
James R. Barlow
4a1fec8328 Improvements to macOS test and work on homebrew tap autobrew
Squashed commits:
[3f06c1e] Try setting up homebrew tap autobuilding
[01532f1] Strict mode error in brew
2017-03-11 17:00:54 -08:00
James R. Barlow
7cd2770a13 Fix issue #137 - proportions of non-square resolution distorted
Distortion mainly affected —force-ocr
2017-02-26 17:13:16 -08:00
James R. Barlow
d1a0065ef8 Create test case for Form XObjects 2017-02-14 12:51:15 -08:00
James R. Barlow
005216bc57 Support ocrmypdf-tess4 2017-01-29 18:26:52 -08:00
James R. Barlow
8c17c9918e Add documentation and test cases for —tesseract-config
This parameter has existed for along time but never really got any
attention.
2017-01-28 22:06:51 -08:00
James R. Barlow
9a15a4db10 Ensure specified destination is writable before starting pipeline process 2017-01-26 22:08:24 -08:00
James R. Barlow
55aeaec293 Autorotation check: Replace duplicated tests with parameterized test 2017-01-26 18:07:59 -08:00