2895 Commits

Author SHA1 Message Date
James R. Barlow
9425506c2a Use pikepdf to handle paletted images
Removes all use of PyMuPDF in optimize
2018-05-18 17:44:29 -07:00
James R. Barlow
93b858afd1 Remove qpdf appimage support for now, check for pngquant 2018-05-18 16:24:33 -07:00
James R. Barlow
7b0a3ec365 Add notes for v7 v7.0.0b2 2018-05-18 00:20:45 -07:00
James R. Barlow
083d442529 main: wording change 2018-05-18 00:20:24 -07:00
James R. Barlow
b52eb95cf8 optimize: use pikepdf to save PIL images
Eliminates another usage of PyMuPDF in the main path.
2018-05-18 00:18:44 -07:00
James R. Barlow
f4571e2508 Ensure we try compress anything that's not compressed when saving 2018-05-17 22:05:01 -07:00
James R. Barlow
b06ef03aac pipeline: use the resolution of the OCR image rather than recalculating
(Recalculating would fail if the image is not centered.)
2018-05-17 16:51:53 -07:00
James R. Barlow
1d1962a106 weave: fix rescaling logic
rotation % 90 == 0 is always true.
2018-05-17 16:50:01 -07:00
James R. Barlow
4b98e9ff08 weave: if we don't have textonly_pdf, delete instruction to draw image 2018-05-17 16:49:20 -07:00
James R. Barlow
f83ca5d8ac weave: whitespace 2018-05-17 16:06:36 -07:00
James R. Barlow
95cb4d22d7 pipeline: make /Info from indirect object as required 2018-05-17 16:06:13 -07:00
James R. Barlow
0c279b01a4 Fix test failure on missing JobContext v7.0.0b1 2018-05-17 01:16:58 -07:00
James R. Barlow
3b820ffa7b test_metadata: change from xfail to skipif without fitz 2018-05-17 00:14:57 -07:00
James R. Barlow
35cb416563 pipeline: remove fitz-based attempt to repair table of contents
Prior to unsplit, if we were rebuilding the PDF we'd lose the
table of contents. With unsplit we keep the original file and patch
the table of contents as necessary, adn that works fine.
This remaining bit of code from PyMuPDF actually damages the
table of contents and removing it fixes the test suite. G'bye.
2018-05-16 23:24:57 -07:00
James R. Barlow
cdb737259c pipeline: remove old page merge strategies 2018-05-16 22:16:54 -07:00
James R. Barlow
0843b5939c pipeline: Move weave* to its own file 2018-05-16 22:08:31 -07:00
James R. Barlow
2b5f23a2d1 Add code to repair ToC with pikepdf 2018-05-16 21:39:23 -07:00
James R. Barlow
5e20d1d554 metadata: Fix failing test on __getitem__['/CreationDate'] 2018-05-16 13:46:07 -07:00
James R. Barlow
18595ca86a Use pikepdf for get_pdfmark
It does fine.
2018-05-16 12:24:35 -07:00
James R. Barlow
3e269fa188 Ubuntu 14.04 has a qpdf 8.0.2 backport, making life easier 2018-05-15 21:43:19 -07:00
James R. Barlow
65405c2cb9 Try getting qpdf from Ubuntu 18.04 2018-05-15 21:27:27 -07:00
James R. Barlow
442cf8897a Travis: maybe upgrading wheel? 2018-05-15 18:12:35 -07:00
James R. Barlow
d5fb275e9e Travis: hack in qpdf appimage version
qpdf from appimage does not report its version with --version if renamed
or accessed via symlink. Use an environment variable to supply it
where needed.
2018-05-15 17:45:58 -07:00
James R. Barlow
e60aec81ca Travis: why can't we use qpdf appimage? 2018-05-15 16:59:16 -07:00
James R. Barlow
398e9e535e optimize: Changed pikepdf API 2018-05-15 16:29:57 -07:00
James R. Barlow
08bf651ef2 Refactor JBIG2 path for non-CCITT monochrome images 2018-05-15 15:32:15 -07:00
James R. Barlow
6171de41bf optimize: move a lot of image scanning code to pikepdf 2018-05-14 22:21:53 -07:00
James R. Barlow
f0a56592e2 Pull JobContext out of pipeline.py to avoid circular reference 2018-05-14 14:01:25 -07:00
James R. Barlow
87a7d4d1a8 Another fitz failure - incorrect object reference introduced
MuPDF/fitz changed some font references to point to table of contents
entries, corrupting the page.  It no longer gets to save.
2018-05-14 13:58:49 -07:00
James R. Barlow
05287902a2 Travis: again 2018-05-13 11:02:25 -07:00
James R. Barlow
96e453feb6 Travis: Tweak setup so it can run 2018-05-13 01:21:24 -07:00
James R. Barlow
9c0fa9fc04 Travis: again 2018-05-13 01:17:04 -07:00
James R. Barlow
3bde0715b0 Move qpdf to before_script 2018-05-13 01:01:48 -07:00
James R. Barlow
e2ec3d8b9b Travis: adjust qpdf appimage 2018-05-13 00:53:31 -07:00
James R. Barlow
ad91eaf8a7 Travis: try using qpdf appimage to speed up build 2018-05-13 00:42:48 -07:00
James R. Barlow
b6d30214fd PyMuPDF 1.13.4 looks good, use it 2018-05-12 12:35:46 -07:00
James R. Barlow
c4ab01d63d Fix "AttributeError: 'ImageInfo' object has no attribute '_type'"
Also deal with 'fixme' imagemask comment.

Also fix bpc incorrectly set to 8 by default on stencil masks.
2018-05-12 12:14:57 -07:00
James R. Barlow
4ba3b3f55a Fix rotate_pages_threshold test failure 2018-05-12 11:47:46 -07:00
James R. Barlow
52d2706a9e optimize: Fix error causing many images to be skipped 2018-05-12 01:37:30 -07:00
James R. Barlow
964afc69f6 leptonica: ErrorTrap is an implementation detail 2018-05-12 01:21:45 -07:00
James R. Barlow
3ddf545ccd optimize: leptonica can fail to open PNG
ERROR - Info in pixReadStreamPng: converting (cmap + alpha) ==> RGBA
Error in pixReadStreamPng: spp == 1, cmap, trans array, invalid depth: 4

To investigate later....
2018-05-12 01:21:19 -07:00
James R. Barlow
f9374733bb optimize: process ICCBased images that declare an /Alternate we recognize 2018-05-12 00:43:36 -07:00
James R. Barlow
5930135f45 optimize: Refactor naming helpers 2018-05-12 00:42:24 -07:00
James R. Barlow
f03f6bc128 optimize: document problem with transcode free compressed image data 2018-05-11 23:43:06 -07:00
James R. Barlow
6c50c70235 Try to optimize paletted images 2018-05-11 23:42:26 -07:00
James R. Barlow
8790fc2c1b optimize: add knobs to control image quality but don't show the user yet 2018-05-11 23:41:49 -07:00
James R. Barlow
f86c4fccf4 optimize: don't alter >8 bpc images 2018-05-11 22:31:24 -07:00
James R. Barlow
7d0785e9ed main: do better parameter validation 2018-05-11 22:31:09 -07:00
James R. Barlow
2cac88162c Ignore masks when deciding what color to rasterize at 2018-05-11 21:27:57 -07:00
James R. Barlow
4809627d8a Fix jbig2enc name 2018-05-11 17:51:08 -07:00