291 Commits

Author SHA1 Message Date
James R. Barlow
5e20d1d554 metadata: Fix failing test on __getitem__['/CreationDate'] 2018-05-16 13:46:07 -07:00
James R. Barlow
6171de41bf optimize: move a lot of image scanning code to pikepdf 2018-05-14 22:21:53 -07:00
James R. Barlow
3254315127 Update test cache 2018-05-11 12:19:50 -07:00
James R. Barlow
ca297fd26b Update tests 2018-05-11 02:33:44 -07:00
James R. Barlow
72253d09fa Add arguments to control optimization 2018-05-10 22:23:24 -07:00
James R. Barlow
24b0adfacc Merge branch 'master' into develop 2018-05-10 20:54:55 -07:00
James R. Barlow
acc6698ab3 Make XML metadata test actually work 2018-05-10 20:37:10 -07:00
James R. Barlow
606d3e6aa1 Remove tests that exercise obsolete features (tesseract, -g) 2018-05-10 20:33:32 -07:00
James R. Barlow
687a7954d6 test_main: uses leptonica 2018-05-10 19:05:31 -07:00
James R. Barlow
abed8e034e Add metadata preservation test from stash 2018-05-10 16:43:28 -07:00
James R. Barlow
b8f3ead541 Remove tesseract renderer entirely
Grafting lets us work with older Tesseract versions as if they could use
sandwich, so there is no point in keeping it. It's been deprecated for a
long time now anyway.
2018-05-10 14:06:13 -07:00
James R. Barlow
9226f8a5d1 Trap PDF/A-3 errors on old Ghostscript 2018-05-04 15:29:43 -07:00
James R. Barlow
7cf83c77ca Merge branch 'feature/pdfa3' 2018-05-03 16:45:57 -07:00
James R. Barlow
8a9f174f63 Fix XMP validation issue with /CreationDate
Related to previous validation issue. If the /CreationDate had no
timezone, Ghostscript also creates invalid metadata. Work around this.
Also fix up PDF date decoding, and transcode dates to standardize them.
2018-05-03 16:30:20 -07:00
James R. Barlow
76276f61e5 Split out rotation related tests 2018-05-01 23:51:35 -07:00
James R. Barlow
bfd26e6ec6 Tests: confirm OCR layer copied 2018-05-01 23:16:41 -07:00
James R. Barlow
b5d7e9cbb0 Fix all issues with rotations
All tests now pass
2018-05-01 22:50:20 -07:00
James R. Barlow
a9abe13185 Remove the old tesseract pdf_renderer 2018-05-01 17:31:34 -07:00
James R. Barlow
6b315e8315 Add ability to disable cache 2018-05-01 15:52:00 -07:00
James R. Barlow
2131ad4670 Fix --remove-background error on PDFs with colormapped images
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.

Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
219fe2155b test_pageinfo: remove duplicate import 2018-04-27 17:16:42 -07:00
James R. Barlow
0934905493 Don't suppress error message from config_notfound
Since it showed up in s390x bionic
2018-04-25 21:58:18 -07:00
James R. Barlow
df87e21c85 Add support for PDF/A-3
No ability to attach files however
2018-04-20 00:06:55 -07:00
Hugo
d761d80750 Use more standard __version__ rather than PILLOW_VERSION (#257) 2018-04-19 23:35:32 -07:00
James R. Barlow
0b10db91be Fix regression: Disable Ghostscript JPEG passthrough entirely 2018-04-17 17:00:24 -07:00
James R. Barlow
1a516b2af9 Fix regression: time stamp test suite failures 2018-04-17 16:59:21 -07:00
James R. Barlow
7368399f8b Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254 2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a Fix list table for tests/resources
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
10aa59f674 v6.1.4 fix test suite regression with Ghostscript 9.23 2018-04-12 15:16:54 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00
James R. Barlow
49fa7f6b5c tesseract_cache: don't reveal host system file paths in manifest file 2018-04-12 00:47:28 -07:00
James R. Barlow
7a1cd39b21 Fix creation date metadata lost from input
Closes #247
2018-04-02 17:53:39 -07:00
James R. Barlow
4f6bffb477 Update copyrights 2018-03-31 11:54:38 -07:00
James R. Barlow
8d9be43c60 test_bookmarks_preserved won't raise ImportError any more
Due to trapping this in ocrmypdf.lib
2018-03-28 23:22:55 -07:00
James R. Barlow
40ef4f0bbe Add new argument --skip-repair to skip the repair step 2018-03-28 00:54:58 -07:00
James R. Barlow
5becfcf8ea Refactor fitz ImportError trap 2018-03-27 21:38:02 -07:00
James R. Barlow
a9bd494cc0 Merge branch 'optional-fitz' 2018-03-27 13:36:33 -07:00
James R. Barlow
6a4df78bc0 Add _naive_find_text to search for text when fitz is not available 2018-03-27 13:36:17 -07:00
James R. Barlow
530eae3898 Fix test_main missing file_claims_pdfa 2018-03-26 15:33:53 -07:00
James R. Barlow
3e444f6a90 Make fitz optional 2018-03-26 13:22:09 -07:00
James R. Barlow
45dbff6401 Fix table of contents not preserved in PDF/A 2018-03-26 02:23:19 -07:00
James R. Barlow
bc56b8e058 Move metadata tests to new test_metadata 2018-03-26 01:49:25 -07:00
James R. Barlow
746969207a Remove deprecated --pdf-renderer tess4, which was renamed to sandwich
Should have been cut in v6.0.0
2018-03-26 01:17:22 -07:00
James R. Barlow
230d301268 conftest: py3.5 path issue 2018-03-25 00:52:45 -07:00
James R. Barlow
a2d00f5f1d tess cache: fix tess3 error for -psm instead of --psm 2018-03-25 00:43:02 -07:00
James R. Barlow
8c1c61f207 test cache: fix Path + str error 2018-03-25 00:02:03 -07:00
James R. Barlow
77476965ae test cache: use .bin extension, fix .gitignore .gitattributes 2018-03-24 23:54:16 -07:00
James R. Barlow
ca51514046 Add test cache 2018-03-24 23:50:41 -07:00
James R. Barlow
8975b72a01 Fix test_testonly_pdf generating an output file in pwd 2018-03-24 22:34:35 -07:00
James R. Barlow
874ec6a87f Add missing fixture to test_unpaper 2018-03-24 22:24:14 -07:00