2676 Commits

Author SHA1 Message Date
James R. Barlow
7baaf00a38 Fix wrong return code tested v7.0.0rc4 v7.0.0rc3 2018-07-05 13:49:22 -07:00
James R. Barlow
5cc23dbf24 pdfinfo: more robustness 2018-07-04 17:12:30 -07:00
James R. Barlow
216d60ea2c pdfinfo: improve the regex 2018-07-04 00:59:32 -07:00
James R. Barlow
8b0496d35e Fix invalid XML characters choking parser 2018-07-03 22:51:59 -07:00
James R. Barlow
e44001641c Return a distinct error code if PDF/A fails 2018-07-03 16:59:03 -07:00
James R. Barlow
47885f4230 Remove initial qpdf.repair
Since pikepdf is doing the work the initial repair takes time and gives
little benefit.

It turns out to not be worthwhile to
save the results of PdfInfo parsing,
since the time to save this seems to exceed the costs of recalculating
it since the "weave" code. At least
for small files.
2018-07-03 16:50:05 -07:00
James R. Barlow
921767e82e ocrmypdf.exec: trap FileNotFoundError too 2018-07-03 00:05:49 -07:00
James R. Barlow
85f96b7fb0 Add test to optimize if jbig2 is present 2018-07-02 23:49:11 -07:00
James R. Barlow
890c7fd0f6 optimize: allow modification of quality settings in command line mode 2018-07-02 23:48:51 -07:00
James R. Barlow
39c44bdd2f Don't use --optimize in test since jbig2enc is not always installed 2018-07-02 23:48:23 -07:00
James R. Barlow
5f99f7f6ca Upgrade to Py3.7 locally and resolve a few issues 2018-07-02 23:47:51 -07:00
James R. Barlow
4f864bce98 Update macOS Brewfile 2018-07-02 22:25:46 -07:00
James R. Barlow
2974929b26 Make jpeg/png quality tunable args 2018-07-02 22:22:59 -07:00
James R. Barlow
db837aa55c Improve release notes 2018-07-02 16:48:33 -07:00
James R. Barlow
7200623007 Fix installation for Python 3.7
Need to use private fork of ruffus for Python 3.7. Backward compatible with Python 3.6 for ruffus 2.6.3

Disable locale checking for 3.7 since the various fixes in that release should make it unnecessary.
2018-07-02 16:47:14 -07:00
James R. Barlow
73e02ae4ea Hopefully workaround Py3.5 marshal error
https://github.com/eliben/pycparser/issues/251
2018-06-29 12:54:48 -07:00
James R. Barlow
d4cbef9457 Update test cache with naming rule change 2018-06-29 12:04:20 -07:00
James R. Barlow
ed8ff79e10 Optimize some of our bigger test files
Only partially optimize multipage.pdf so that it hopefully
improves speed of test suite without being useless as an
optimization test.
2018-06-29 00:35:49 -07:00
James R. Barlow
e725f64b6a Add test case to ensure mono is not inverted 2018-06-29 00:25:11 -07:00
James R. Barlow
0029cc4fe7 optimize: fix PNGs that were reduced to 1-bit being inverted
At some point the color gets flipped, we have to flip it again,
for mono.

Incidentally this exposed an unused
optimization. Should change the
first past to scan all images and
record monochrome xrefs, then optimize
JPEG and PNG, possibly adding
mono images to the monochrome
queue. Finally, do JBIG2 optimization.
2018-06-29 00:09:20 -07:00
James R. Barlow
9637696a54 Fix test resources naming inconsistency 2018-06-28 23:37:14 -07:00
James R. Barlow
02b3ca6862 Compress test images more heavily 2018-06-28 21:40:12 -07:00
James R. Barlow
bc90f40a8f Replace all Pix.read with Pix.open 2018-06-28 15:13:26 -07:00
James R. Barlow
3d727ff4c0 Fix leptonica remove_colormap was replaced with a no-op at some point 2018-06-28 15:11:51 -07:00
James R. Barlow
b0eacd6586 Add Python 3.7 support 2018-06-28 13:57:45 -07:00
James R. Barlow
7795701595 Merge branch 'test/ignore-masks' 2018-06-28 13:05:45 -07:00
James R. Barlow
bf214eecb3 Use newer pikepdf API for objgen 2018-06-28 12:59:01 -07:00
James R. Barlow
434b96d734 optimize: skip incremental images if any
These are fairly rare
2018-06-24 00:18:48 -07:00
James R. Barlow
b9dc109892 optimize: use new pikepdf api for objgen 2018-06-24 00:16:28 -07:00
James R. Barlow
1f40a70554 Use qpdf 8.0.2 backport, force old pytest-timeout to fix build v6.2.1 2018-06-23 03:14:18 -07:00
James R. Barlow
e14ffbf03f v6.2.1 release notes 2018-06-23 03:01:54 -07:00
James R. Barlow
25a1dde57c Fix recent versions of tesseract not registering as textonly_pdf
This change happened sometime after the 4.0.0-beta1 release in
Ubuntu 18.04
2018-06-23 02:59:22 -07:00
James R. Barlow
bf96171b65 Ignore whether or not textonly_pdf was used in cache
The difference doesn't matter in 7.0.0 anymore.
2018-06-23 02:58:26 -07:00
James R. Barlow
b7ff821fa3 Fix recent versions of tesseract not registering as textonly_pdf
This change happened sometime after the 4.0.0-beta1 release in
Ubuntu 18.04
2018-06-23 02:55:58 -07:00
James R. Barlow
b81daf71d1 Regenerate test cache 2018-06-23 02:02:58 -07:00
James R. Barlow
faad1fc58a Reactivate two tests that weren't using their fixtures properly 2018-06-23 01:54:09 -07:00
James R. Barlow
6f48181a56 Disable a pylint 2018-06-23 01:53:04 -07:00
James R. Barlow
f1305e5a37 pdfa: fix function using closure when it shouldn't 2018-06-23 01:52:36 -07:00
James R. Barlow
f0e0f92776 leptonica: fix variables defined on class outside __init__ 2018-06-23 01:51:55 -07:00
James R. Barlow
807c8b0726 Trailing whitespace 2018-06-23 01:51:19 -07:00
James R. Barlow
6333ec928c Cleanup some cases where log was lazy and should be 2018-06-23 01:50:27 -07:00
James R. Barlow
cd220d9ed9 pipeline: search_window variable not actually used 2018-06-23 01:48:57 -07:00
James R. Barlow
76532649b8 tesseract.get_orientation: removed unused language parameter 2018-06-23 01:48:24 -07:00
James R. Barlow
b0dbaeafc5 Cleanup unused imports 2018-06-23 01:47:53 -07:00
James R. Barlow
2530d1791b Fix several pylint errors and warnings 2018-06-23 00:54:22 -07:00
James R. Barlow
94150f414a Remove qpdf.merge
We no longer need to merge pages this way. Much of the functionality
was there to implement page splitting without hitting ulimit which
will be fixed in qpdf > 8.0.2. The tests were expensive to run.

Also remove pytest-timeout since it breaks the Linux build.
2018-06-23 00:45:03 -07:00
James R. Barlow
54e74f84cc Remove special of TypeError from ruffus
split_pages would still run if repair_pdf failed, for some reason.
Since we are no longer splitting pages this is vestigial.
2018-06-23 00:41:20 -07:00
James R. Barlow
76e7e8dbbb Replace several uses of str(path) with fspath(path)
Helps make it more explicit. Did not do this to tests because use of paths
is more involved there.
2018-06-22 21:00:47 -07:00
James R. Barlow
324598e992 Remove helpers.universal_open()
This helper function only had a single usage, this was always an awkward
way to support Python 3.5 that I'd forget to use.
2018-06-22 17:56:20 -07:00
James R. Barlow
9e765ddf46 Rename _optimize to optimize.py 2018-06-22 17:51:57 -07:00