2895 Commits

Author SHA1 Message Date
James R. Barlow
5fde214290 Update notes for v6.1.5 2018-04-17 15:23:35 -07:00
James R. Barlow
a620724d6a Fix PDF/A validation failure due to timezone being omitted from /ModDate 2018-04-17 15:16:48 -07:00
James R. Barlow
640b953ec7 Fix PDF/A validation failure due to timezone being omitted from /ModDate 2018-04-17 14:55:32 -07:00
James R. Barlow
a009ca7597 Disable JPEG passthrough for Ghostscript 9.23
Seems to corrupt JPEGs involved in image masks?
2018-04-17 13:54:34 -07:00
James R. Barlow
7368399f8b Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254 2018-04-17 11:56:36 -07:00
James R. Barlow
c974aec934 Search for image masks too 2018-04-17 02:06:08 -07:00
James R. Barlow
3033f03f64 Iterate images with pikepdf / fix mono PNG corruption
To work around PNG corruption problem in PyMuPDF for monochrome images,
extract and save monochrome CCITT with synthetic TIFF header.

Works better but currently skips /ImageMask due to qpdf
implementation, which affects many files.
2018-04-17 01:50:37 -07:00
James R. Barlow
72723e0bb5 optimize: be quieter 2018-04-16 18:06:02 -07:00
James R. Barlow
2fb6ab3939 Trap writePNG error 2018-04-16 17:29:10 -07:00
James R. Barlow
25c1c160b8 Move optimize to new file 2018-04-16 17:22:06 -07:00
James R. Barlow
7e92895471 Parallelize pngquant 2018-04-16 12:37:51 -07:00
James R. Barlow
d291d48991 PNG palette: parse PDF string from leptonica instead
Seems better to accept whatever leptonica rather than make detailed
assumptions about how it encodes the palette.

Experimented with setting FlateDecode on the palette but it seems to
expand it.
2018-04-16 12:16:13 -07:00
James R. Barlow
0e6b8042b0 Implement PNG palettization 2018-04-16 11:18:52 -07:00
James R. Barlow
34c78a892a Fix list table for tests/resources
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
9d28879505 Update Ubuntu 14.04 instructions
Closes #252
2018-04-14 17:30:33 -07:00
James R. Barlow
2482296e2b hocr: avoid division by zero
Issue #253 - PDF that produces the error is not available, but if font_width
is zero, chances are the text is nonprinting characters, so suppress it.
2018-04-14 17:24:21 -07:00
James R. Barlow
f755fb76ee Try pngquant 2018-04-14 01:37:14 -07:00
James R. Barlow
c61b5dcb62 Fix PDF/A validation error from setting /Predictor 0 2018-04-14 01:36:46 -07:00
James R. Barlow
fae893b9d9 Reinstate transcoding of PNG 2018-04-14 00:19:24 -07:00
James R. Barlow
10aadefd6a Document return codes 2018-04-14 00:18:58 -07:00
James R. Barlow
e75b6280fd Try reading compressed data directly to see if Leptonica will add predictor
Turns out it does not transcode at all in this case, so probably going
to revert to transcoding PNG -> PNG. However if pngquant or similar is
done, this API will still be useful.
2018-04-13 23:55:23 -07:00
James R. Barlow
8c4023165a Release L_COMP_DATA properly 2018-04-13 23:53:41 -07:00
James R. Barlow
b7d403f106 Deprecate Pix.read() behaving as an open function 2018-04-13 23:52:46 -07:00
James R. Barlow
b069de0caa Use Leptonica to rewrite all PNGs with predictor
Leptonica does a better job of encoding them than Ghostscript, about -15%.
For a test file 450k worth of
PNGs was reduced to 388k with no loss of quality.
2018-04-13 16:35:50 -07:00
James R. Barlow
136da74bfa Update branch with v6.1.4 2018-04-13 12:57:21 -07:00
James R. Barlow
7fc897e6dc Fix NameError 'ghostscript' v6.1.4 2018-04-12 21:24:05 -07:00
James R. Barlow
9b731d63b8 Set Ghostscript -sColorConversionStrategy the way old/new versions expect 2018-04-12 16:28:48 -07:00
James R. Barlow
10aa59f674 v6.1.4 fix test suite regression with Ghostscript 9.23 2018-04-12 15:16:54 -07:00
James R. Barlow
1f7837e7b1 v6.1.4 release notes update 2018-04-12 00:55:45 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00
James R. Barlow
49fa7f6b5c tesseract_cache: don't reveal host system file paths in manifest file 2018-04-12 00:47:28 -07:00
James R. Barlow
c95db246d4 v6.1.4 merge 2018-04-11 15:58:00 -07:00
James R. Barlow
1ba93371ce docs: Update installation to reflect qpdf 7.0.0 requirement 2018-04-11 15:40:50 -07:00
James R. Barlow
fedbbdb575 Travis: compile qpdf from source
The older version in Travis's Ubuntu 14.04 can't pass the test suite anymore.
2018-04-11 15:40:45 -07:00
James R. Barlow
85ebba72bc Fix setup.py syntax 2018-04-10 18:30:48 -07:00
James R. Barlow
b6cd436d5d setup: Blacklist Pillow 5.1.0 on macos
https://github.com/python-pillow/Pillow/issues/3068
2018-04-10 18:15:37 -07:00
James R. Barlow
ec170c7e1e Travis: use setup.py for requirements, don't override with .txt 2018-04-10 17:52:19 -07:00
James R. Barlow
f6399eb90f optimize: use Leptonica to compact JPEGs
Pillow could do it too, but Leptonica is somewhat more PDF aware.
2018-04-10 17:45:05 -07:00
James R. Barlow
77f2448e59 Leptonica: add L_COMP_DATA compressed data manager 2018-04-10 17:44:03 -07:00
James R. Barlow
3d69b46fca Release notes 2018-04-10 15:53:02 -07:00
James R. Barlow
4b6153ad18 Use defusedxml for XML parsing when reading XMP 2018-04-10 14:25:13 -07:00
James R. Barlow
75d37eb103 docs: expand ocr of image usage 2018-04-09 13:06:09 -07:00
James R. Barlow
11b6f77df0 unpaper: close images on error paths 2018-04-09 13:05:12 -07:00
James R. Barlow
db8b0319dd get_version: repeat system error messages if the process exists with a signal 2018-04-09 13:04:51 -07:00
James R. Barlow
c9dd330766 JBIG2: refactor, don't recompress existing JBIG2 2018-04-09 13:04:10 -07:00
James R. Barlow
e40228102c JBIG2: Streams created in this manner are already indirect objects 2018-04-06 17:11:17 -07:00
James R. Barlow
7889c6fb4c Parallelize JBIG2 execution with thread pools 2018-04-06 17:00:23 -07:00
James R. Barlow
6eb1773110 Fix JBIG2Globals included multiple times in output 2018-04-06 17:00:03 -07:00
James R. Barlow
1d25823746 Implement functional, single threaded optimize
Passes verapdf
2018-04-06 15:49:16 -07:00
James R. Barlow
d1d4f1e198 Add issue links to release notes 2018-04-06 14:52:40 -07:00