2895 Commits

Author SHA1 Message Date
James R. Barlow
76276f61e5 Split out rotation related tests 2018-05-01 23:51:35 -07:00
James R. Barlow
bfd26e6ec6 Tests: confirm OCR layer copied 2018-05-01 23:16:41 -07:00
James R. Barlow
d787e1ea0f ghostscript.py not saved in last commit
Given importance of last one, confirmed that when the file is saved all tests pass too.
Passing is invariant with this change.
2018-05-01 22:59:22 -07:00
James R. Barlow
b5d7e9cbb0 Fix all issues with rotations
All tests now pass
2018-05-01 22:50:20 -07:00
James R. Barlow
f3b6d9dcdf Fix a comment about Tesseract behavior in certain versions 2018-05-01 21:31:09 -07:00
James R. Barlow
a9abe13185 Remove the old tesseract pdf_renderer 2018-05-01 17:31:34 -07:00
James R. Barlow
6b315e8315 Add ability to disable cache 2018-05-01 15:52:00 -07:00
James R. Barlow
37677de884 Fix regressions: pdfa.ps not used, PDF/A failures, handling of text layers with no font 2018-05-01 15:51:46 -07:00
James R. Barlow
c7387de325 Fix auto rotate 2018-05-01 15:18:28 -07:00
James R. Barlow
2495b1e038 Refactor find font, get test cases working again 2018-05-01 14:48:41 -07:00
James R. Barlow
073ee52ce7 Use hocr and weave; eliminate old combine layers and merge pages 2018-05-01 14:21:53 -07:00
James R. Barlow
54150a14e9 Further elimination of tesseract renderer special casing
We don't need to keep a "skip page" around anymore since
skipping means just not grafting on the text layer.
2018-05-01 13:36:20 -07:00
James R. Barlow
88ff091cce Unify tesseract and sandwich renderer paths
Since the new weaving method copies the font and content
stream from the Tesseract PDF, it doesn't matter if Tesseract
happens to have an image or not.
If Tesseract is text-only capable we use that feature for efficiency,
but ignore the image either way.
2018-05-01 13:24:20 -07:00
James R. Barlow
e87a5776f1 Remove now-unnecessary code to rotate pages
Track only the decision to change rotation.
2018-05-01 13:01:25 -07:00
James R. Barlow
0806ce6406 Fix rotation for unsplit (modulo --rotate-pages) 2018-04-30 20:58:42 -07:00
James R. Barlow
6409894a71 feature/unsplit-try-imagerotate 2018-04-30 20:48:59 -07:00
James R. Barlow
e7286f6129 Unsplit now works with multipage, --force-ocr 2018-04-30 14:46:20 -07:00
James R. Barlow
2ab94b3151 unsplit: it's alive
First successful file output.
2018-04-28 01:57:41 -07:00
James R. Barlow
7ee90890ec Add copying of essential information from Tesseract textonly 2018-04-27 23:19:08 -07:00
James R. Barlow
383e726d65 Expand size growth reasons to other arguments that trigger transcoding 2018-04-27 19:34:57 -07:00
James R. Barlow
e046f70642 Set OMP_THREAD_LIMIT unconditionally, for pngquant 2018-04-27 19:19:30 -07:00
James R. Barlow
2131ad4670 Fix --remove-background error on PDFs with colormapped images
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.

Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
219fe2155b test_pageinfo: remove duplicate import 2018-04-27 17:16:42 -07:00
James R. Barlow
4209034d20 Add gpg key to issue template 2018-04-27 15:51:26 -07:00
James R. Barlow
abcae0c2a4 Fix helpers.py again 2018-04-25 22:10:51 -07:00
James R. Barlow
0934905493 Don't suppress error message from config_notfound
Since it showed up in s390x bionic
2018-04-25 21:58:18 -07:00
James R. Barlow
11cd6201d9 helpers: fix missing call to complain()
In practice this is probably unreachable.
2018-04-25 21:57:50 -07:00
James R. Barlow
8d2a917676 Page unsplit, development 2018-04-25 21:56:43 -07:00
James R. Barlow
44b4afa534 Begin conversion from page splititng to page markers 2018-04-23 22:57:50 -07:00
James R. Barlow
775be3933c Cherrypick merge_pages unification 2018-04-20 23:08:15 -07:00
James R. Barlow
df87e21c85 Add support for PDF/A-3
No ability to attach files however
2018-04-20 00:06:55 -07:00
Hugo
d761d80750 Use more standard __version__ rather than PILLOW_VERSION (#257) 2018-04-19 23:35:32 -07:00
James R. Barlow
8052019dde optimize: fix reporting of jbig2 groups 2018-04-19 01:54:44 -07:00
James R. Barlow
a3d8950088 optimize: Don't save JPEGs if larger 2018-04-19 01:25:49 -07:00
James R. Barlow
004f5d3bf1 optimize: further improve decodeparms handling 2018-04-18 15:52:25 -07:00
James R. Barlow
f5d308a156 optimize: refactor tricky /Filter and /DecodeParms handling 2018-04-18 15:30:21 -07:00
James R. Barlow
3869996758 optimize: jbig2 error 2018-04-18 01:36:38 -07:00
James R. Barlow
cdb2107c4e optimize: jbigs2 fix 2018-04-18 01:31:35 -07:00
James R. Barlow
4db2b3413b optimize: more robustness 2018-04-18 01:25:34 -07:00
James R. Barlow
b2f31bec79 Make optimize a lot safer 2018-04-18 00:20:06 -07:00
James R. Barlow
78f9f4a266 Be more defensive about accessing 2018-04-18 00:11:39 -07:00
James R. Barlow
ad6087c342 optimize: more fixes 2018-04-17 23:58:10 -07:00
James R. Barlow
0d6ef430de optimize: fix "length not defined" 2018-04-17 23:38:00 -07:00
James R. Barlow
a5942209e8 optimize: fix error on missing /Filter 2018-04-17 23:27:56 -07:00
James R. Barlow
9a60694cfc optimize: ccitt header fixes
Changed to match TIFF spec's use of unsigned types, eliminated check for
/Columns.

There is some complex behavior for /Width != /Columns and
(/Width, /Columns) mod 8 != 0
that is not described well in the PDF spec.
2018-04-17 23:27:25 -07:00
James R. Barlow
4bf13f4737 optimize: be less chatty 2018-04-17 23:25:41 -07:00
James R. Barlow
9e89b75186 Merge v6.1.5 2018-04-17 22:51:13 -07:00
James R. Barlow
0b10db91be Fix regression: Disable Ghostscript JPEG passthrough entirely v6.1.5 2018-04-17 17:00:24 -07:00
James R. Barlow
1a516b2af9 Fix regression: time stamp test suite failures 2018-04-17 16:59:21 -07:00
James R. Barlow
076363d78e Disable JPEG passthrough for Ghostscript 9.23
Seems to corrupt JPEGs involved in image masks?
2018-04-17 16:31:03 -07:00