2676 Commits

Author SHA1 Message Date
James R. Barlow
6ac9e92f17 Fix PEP8 docstring convention misuse in a few places 2018-06-22 17:51:25 -07:00
James R. Barlow
faaa4a1def Ghostscript, PDF/A: support pathlib 2018-06-22 17:45:10 -07:00
James R. Barlow
0aa51f0f3a Remove fitz from Travis 2018-06-18 15:38:41 -07:00
James R. Barlow
73431d9761 Remove obsolete _naive_find_text 2018-06-13 14:00:50 -07:00
James R. Barlow
45cb4525cf Remove other references to PyMuPDF 2018-06-13 01:02:53 -07:00
James R. Barlow
8c84c515b6 Use Ghostscript for text region detection
Ghostscript txtwrite seems to be quite effective at the task.

Eliminates dependency on fitz
2018-06-13 00:58:09 -07:00
James R. Barlow
1dfbbdebf4 Adjust for pikepdf API change v7.0.0rc2 2018-06-08 22:47:56 -07:00
James R. Barlow
740918daee Create debug envvar to override Creator or Producer
Note that Ghostscript always overrides Producer
2018-06-06 23:17:28 -07:00
jbarlow83
1d10eac764
Add wiki link to issue template
[ci skip]
2018-06-06 12:59:59 -07:00
jbarlow83
3f868118cd
Remove gpg
[ci skip]
2018-06-06 12:58:02 -07:00
James R. Barlow
04d79b15b4 optimize: fix error in Py3.5 v7.0.0rc1 2018-06-06 12:25:32 -07:00
James R. Barlow
a13c398c06 Suppress some spurious tesseract errors 2018-06-05 23:26:28 -07:00
James R. Barlow
e3b3f716ee optimize: use tempdir for cmdline invocation 2018-06-05 21:20:54 -07:00
James R. Barlow
cf43c06f46 Use python-xmp-toolkit for xmp check
Eliminates PyPDF2 and defusedxml as dependencies.
2018-05-29 22:00:52 -07:00
James R. Barlow
74a5a18607 Tweak release notes v7.0.0b4 2018-05-28 14:52:06 -07:00
James R. Barlow
44241c6dd5 Travis: remove deploy to testpypi since it's broken 2018-05-27 01:49:18 -07:00
James R. Barlow
8fff496ffd Fix Py3.5 not understanding os.path.exists(Path(...)) v7.0.0b3 2018-05-26 22:55:22 -07:00
James R. Barlow
edf75c519c Update v7 release notes 2018-05-26 02:08:49 -07:00
James R. Barlow
9608b22d34 Remove all uses of PyPDF2 except PDF/A check
Leave PDF/A check alone for now, since pikepdf has no equivalent.
2018-05-26 02:07:18 -07:00
James R. Barlow
8ba4968c48 pdfinfo: more robustness 2018-05-26 01:54:25 -07:00
James R. Barlow
ffdd78f1a5 pdfinfo: Fix text_operators type not changed in related commit 2018-05-25 02:10:39 -07:00
James R. Barlow
ad9f8ca78e pdfinfo: reinstate stack normalization for q/Q 2018-05-25 01:28:26 -07:00
James R. Barlow
78a686ecb4 Consider qpdf behavior on algo4 a pass
qpdf opens files with null user password, so do the same.
2018-05-25 00:33:31 -07:00
James R. Barlow
59e786eb3c Remove old code to deal with single page only things 2018-05-25 00:32:55 -07:00
James R. Barlow
6d0461435f Use OperandGrouper whitelist 2018-05-24 22:52:33 -07:00
James R. Barlow
0a04a60f69 Document need for pdfinfo to be pickleable 2018-05-24 22:24:13 -07:00
James R. Barlow
68d8642988 Found out this test was extremely slow - no reason to actual use a large file 2018-05-24 22:22:51 -07:00
James R. Barlow
16f70ff054 Main changeset for pikepdf-based refactor pdfinfo 2018-05-24 22:22:01 -07:00
James R. Barlow
c00aeafff0 Add scratch file 2018-05-24 22:20:15 -07:00
James R. Barlow
83f35e00f3 Start removing PyPDF2 2018-05-21 01:28:21 -07:00
James R. Barlow
786a2ad65a Make optimize test do a little more 2018-05-18 17:50:39 -07:00
James R. Barlow
9425506c2a Use pikepdf to handle paletted images
Removes all use of PyMuPDF in optimize
2018-05-18 17:44:29 -07:00
James R. Barlow
93b858afd1 Remove qpdf appimage support for now, check for pngquant 2018-05-18 16:24:33 -07:00
James R. Barlow
7b0a3ec365 Add notes for v7 v7.0.0b2 2018-05-18 00:20:45 -07:00
James R. Barlow
083d442529 main: wording change 2018-05-18 00:20:24 -07:00
James R. Barlow
b52eb95cf8 optimize: use pikepdf to save PIL images
Eliminates another usage of PyMuPDF in the main path.
2018-05-18 00:18:44 -07:00
James R. Barlow
f4571e2508 Ensure we try compress anything that's not compressed when saving 2018-05-17 22:05:01 -07:00
James R. Barlow
b06ef03aac pipeline: use the resolution of the OCR image rather than recalculating
(Recalculating would fail if the image is not centered.)
2018-05-17 16:51:53 -07:00
James R. Barlow
1d1962a106 weave: fix rescaling logic
rotation % 90 == 0 is always true.
2018-05-17 16:50:01 -07:00
James R. Barlow
4b98e9ff08 weave: if we don't have textonly_pdf, delete instruction to draw image 2018-05-17 16:49:20 -07:00
James R. Barlow
f83ca5d8ac weave: whitespace 2018-05-17 16:06:36 -07:00
James R. Barlow
95cb4d22d7 pipeline: make /Info from indirect object as required 2018-05-17 16:06:13 -07:00
James R. Barlow
0c279b01a4 Fix test failure on missing JobContext v7.0.0b1 2018-05-17 01:16:58 -07:00
James R. Barlow
3b820ffa7b test_metadata: change from xfail to skipif without fitz 2018-05-17 00:14:57 -07:00
James R. Barlow
35cb416563 pipeline: remove fitz-based attempt to repair table of contents
Prior to unsplit, if we were rebuilding the PDF we'd lose the
table of contents. With unsplit we keep the original file and patch
the table of contents as necessary, adn that works fine.
This remaining bit of code from PyMuPDF actually damages the
table of contents and removing it fixes the test suite. G'bye.
2018-05-16 23:24:57 -07:00
James R. Barlow
cdb737259c pipeline: remove old page merge strategies 2018-05-16 22:16:54 -07:00
James R. Barlow
0843b5939c pipeline: Move weave* to its own file 2018-05-16 22:08:31 -07:00
James R. Barlow
2b5f23a2d1 Add code to repair ToC with pikepdf 2018-05-16 21:39:23 -07:00
James R. Barlow
5e20d1d554 metadata: Fix failing test on __getitem__['/CreationDate'] 2018-05-16 13:46:07 -07:00
James R. Barlow
18595ca86a Use pikepdf for get_pdfmark
It does fine.
2018-05-16 12:24:35 -07:00