James R. Barlow
25a1dde57c
Fix recent versions of tesseract not registering as textonly_pdf
...
This change happened sometime after the 4.0.0-beta1 release in
Ubuntu 18.04
2018-06-23 02:59:22 -07:00
James R. Barlow
bf96171b65
Ignore whether or not textonly_pdf was used in cache
...
The difference doesn't matter in 7.0.0 anymore.
2018-06-23 02:58:26 -07:00
James R. Barlow
b7ff821fa3
Fix recent versions of tesseract not registering as textonly_pdf
...
This change happened sometime after the 4.0.0-beta1 release in
Ubuntu 18.04
2018-06-23 02:55:58 -07:00
James R. Barlow
b81daf71d1
Regenerate test cache
2018-06-23 02:02:58 -07:00
James R. Barlow
faad1fc58a
Reactivate two tests that weren't using their fixtures properly
2018-06-23 01:54:09 -07:00
James R. Barlow
6f48181a56
Disable a pylint
2018-06-23 01:53:04 -07:00
James R. Barlow
f1305e5a37
pdfa: fix function using closure when it shouldn't
2018-06-23 01:52:36 -07:00
James R. Barlow
f0e0f92776
leptonica: fix variables defined on class outside __init__
2018-06-23 01:51:55 -07:00
James R. Barlow
807c8b0726
Trailing whitespace
2018-06-23 01:51:19 -07:00
James R. Barlow
6333ec928c
Cleanup some cases where log was lazy and should be
2018-06-23 01:50:27 -07:00
James R. Barlow
cd220d9ed9
pipeline: search_window variable not actually used
2018-06-23 01:48:57 -07:00
James R. Barlow
76532649b8
tesseract.get_orientation: removed unused language parameter
2018-06-23 01:48:24 -07:00
James R. Barlow
b0dbaeafc5
Cleanup unused imports
2018-06-23 01:47:53 -07:00
James R. Barlow
2530d1791b
Fix several pylint errors and warnings
2018-06-23 00:54:22 -07:00
James R. Barlow
94150f414a
Remove qpdf.merge
...
We no longer need to merge pages this way. Much of the functionality
was there to implement page splitting without hitting ulimit which
will be fixed in qpdf > 8.0.2. The tests were expensive to run.
Also remove pytest-timeout since it breaks the Linux build.
2018-06-23 00:45:03 -07:00
James R. Barlow
54e74f84cc
Remove special of TypeError from ruffus
...
split_pages would still run if repair_pdf failed, for some reason.
Since we are no longer splitting pages this is vestigial.
2018-06-23 00:41:20 -07:00
James R. Barlow
76e7e8dbbb
Replace several uses of str(path) with fspath(path)
...
Helps make it more explicit. Did not do this to tests because use of paths
is more involved there.
2018-06-22 21:00:47 -07:00
James R. Barlow
324598e992
Remove helpers.universal_open()
...
This helper function only had a single usage, this was always an awkward
way to support Python 3.5 that I'd forget to use.
2018-06-22 17:56:20 -07:00
James R. Barlow
9e765ddf46
Rename _optimize to optimize.py
2018-06-22 17:51:57 -07:00
James R. Barlow
6ac9e92f17
Fix PEP8 docstring convention misuse in a few places
2018-06-22 17:51:25 -07:00
James R. Barlow
faaa4a1def
Ghostscript, PDF/A: support pathlib
2018-06-22 17:45:10 -07:00
James R. Barlow
0aa51f0f3a
Remove fitz from Travis
2018-06-18 15:38:41 -07:00
James R. Barlow
73431d9761
Remove obsolete _naive_find_text
2018-06-13 14:00:50 -07:00
James R. Barlow
45cb4525cf
Remove other references to PyMuPDF
2018-06-13 01:02:53 -07:00
James R. Barlow
8c84c515b6
Use Ghostscript for text region detection
...
Ghostscript txtwrite seems to be quite effective at the task.
Eliminates dependency on fitz
2018-06-13 00:58:09 -07:00
James R. Barlow
1dfbbdebf4
Adjust for pikepdf API change
v7.0.0rc2
2018-06-08 22:47:56 -07:00
James R. Barlow
740918daee
Create debug envvar to override Creator or Producer
...
Note that Ghostscript always overrides Producer
2018-06-06 23:17:28 -07:00
jbarlow83
1d10eac764
Add wiki link to issue template
...
[ci skip]
2018-06-06 12:59:59 -07:00
jbarlow83
3f868118cd
Remove gpg
...
[ci skip]
2018-06-06 12:58:02 -07:00
James R. Barlow
04d79b15b4
optimize: fix error in Py3.5
v7.0.0rc1
2018-06-06 12:25:32 -07:00
James R. Barlow
a13c398c06
Suppress some spurious tesseract errors
2018-06-05 23:26:28 -07:00
James R. Barlow
e3b3f716ee
optimize: use tempdir for cmdline invocation
2018-06-05 21:20:54 -07:00
James R. Barlow
cf43c06f46
Use python-xmp-toolkit for xmp check
...
Eliminates PyPDF2 and defusedxml as dependencies.
2018-05-29 22:00:52 -07:00
James R. Barlow
74a5a18607
Tweak release notes
v7.0.0b4
2018-05-28 14:52:06 -07:00
James R. Barlow
44241c6dd5
Travis: remove deploy to testpypi since it's broken
2018-05-27 01:49:18 -07:00
James R. Barlow
8fff496ffd
Fix Py3.5 not understanding os.path.exists(Path(...))
v7.0.0b3
2018-05-26 22:55:22 -07:00
James R. Barlow
edf75c519c
Update v7 release notes
2018-05-26 02:08:49 -07:00
James R. Barlow
9608b22d34
Remove all uses of PyPDF2 except PDF/A check
...
Leave PDF/A check alone for now, since pikepdf has no equivalent.
2018-05-26 02:07:18 -07:00
James R. Barlow
8ba4968c48
pdfinfo: more robustness
2018-05-26 01:54:25 -07:00
James R. Barlow
ffdd78f1a5
pdfinfo: Fix text_operators type not changed in related commit
2018-05-25 02:10:39 -07:00
James R. Barlow
ad9f8ca78e
pdfinfo: reinstate stack normalization for q/Q
2018-05-25 01:28:26 -07:00
James R. Barlow
78a686ecb4
Consider qpdf behavior on algo4 a pass
...
qpdf opens files with null user password, so do the same.
2018-05-25 00:33:31 -07:00
James R. Barlow
59e786eb3c
Remove old code to deal with single page only things
2018-05-25 00:32:55 -07:00
James R. Barlow
6d0461435f
Use OperandGrouper whitelist
2018-05-24 22:52:33 -07:00
James R. Barlow
0a04a60f69
Document need for pdfinfo to be pickleable
2018-05-24 22:24:13 -07:00
James R. Barlow
68d8642988
Found out this test was extremely slow - no reason to actual use a large file
2018-05-24 22:22:51 -07:00
James R. Barlow
16f70ff054
Main changeset for pikepdf-based refactor pdfinfo
2018-05-24 22:22:01 -07:00
James R. Barlow
c00aeafff0
Add scratch file
2018-05-24 22:20:15 -07:00
James R. Barlow
83f35e00f3
Start removing PyPDF2
2018-05-21 01:28:21 -07:00
James R. Barlow
786a2ad65a
Make optimize test do a little more
2018-05-18 17:50:39 -07:00