167 Commits

Author SHA1 Message Date
James R. Barlow
4581027246 Drop support for pdfminer.six 20181108
This version required a patch that has since been mainlined, and also did not
declare its dependency on chardet
correctly. We can remove both hacks now.
2020-04-15 02:50:36 -07:00
James R. Barlow
f4f7946a0c Add colored logs 2020-04-15 00:05:38 -07:00
James R. Barlow
4fdbf55c11 setup: approve pdfminer.six 20200124 2020-02-09 23:50:56 -08:00
James R. Barlow
fd991a2380 Allow pdfminer.six 20200104 and update recommended versions 2020-01-05 21:37:28 -08:00
James R. Barlow
9559b0b186 Use pikepdf to perform qpdf.check() 2019-12-11 01:21:15 -08:00
James R. Barlow
9af59c0d6d docs: improvements for Windows 2019-12-09 21:39:01 -08:00
James R. Barlow
0a08d6ce1f Update version of pdfminer.six supported 2019-11-13 01:45:06 -08:00
James R. Barlow
5bd6665b49 Use pikepdf 1.7.0 to improve Python 3.8 support 2019-11-11 22:36:38 -08:00
James R. Barlow
3438afaffe Support pdfminer.six 20191020 2019-11-04 03:15:59 -08:00
James R. Barlow
cdcdd16865 Require Pillow 6.2.0 based on security vulnerability report in older versions 2019-10-23 12:27:29 -07:00
James R. Barlow
b55d7e57af Python 3.8 updates 2019-10-20 03:20:54 -07:00
James R. Barlow
1a91cd4652 pikepdf 1.6 2019-07-27 04:36:48 -07:00
James R. Barlow
6fbeb6347d Merge api (without plugins) 2019-07-27 02:04:01 -07:00
James R. Barlow
12769b96e5 Drop support for omitting pdfminer.six 2019-07-10 13:37:01 -07:00
James R. Barlow
e528adc603 pylint removal 2019-05-17 01:09:06 -07:00
James R. Barlow
13ab23ba54 Refactor weave_layers, introduce progress bar
Fixes a bug in this branch where --sidecar would fail by trying to iterator
the executor futures twice.
2019-05-16 14:57:31 -07:00
James R. Barlow
c904b430b6 Merge master into api branch; all test pass 2019-05-14 16:33:02 -07:00
James R. Barlow
c2fecffdb4 Require pikepdf 1.3.0 2019-05-12 02:16:05 -07:00
James R. Barlow
58c29ffb5c weave: use explicit pdf.close(), drastically reduce open file handles
With the new pikepdf 1.2.0 we no longer need to hold file handles
open because of the "copy to memory" functionality. We retain
the behavior of closing/reopening the output PDF every 100 pages as
a way to limit memory usage.
2019-04-18 15:12:48 -07:00
mawi
39617dd739 fix: remove ruffus 2019-04-08 11:07:32 +02:00
Martin Wind
2fa43ecf09 refactor: split argparse and run_pipline 2019-03-26 08:10:20 +01:00
James R. Barlow
67a405c6b7 Move install-time external program checks out of setup.py
We did runtime tests for several of them anyway, and it's better to do
at runtime since config may change after installation.
2019-03-03 03:26:56 -08:00
James R. Barlow
602570fcf9 Update requirements 2019-03-03 02:27:56 -08:00
James R. Barlow
92c8a5885e Declare build system in pyproject.toml 2019-02-26 12:23:33 -08:00
James R. Barlow
b8cd3acd9e v8.0.1 notes 2019-01-17 00:57:28 -08:00
James R. Barlow
f472587d22 Bump pikepdf version, point to release notes 2019-01-05 16:48:13 -08:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
089ece2715 use pikepdf 0.10.2 2019-01-03 12:08:43 -08:00
James R. Barlow
7d330afd81 Delinting 2019-01-02 13:34:45 -08:00
James R. Barlow
68fbd9fcc9 pikepdf: version bump 2018-12-31 15:37:31 -08:00
James R. Barlow
c771938907 Convert to f-strings where it makes sense 2018-12-31 15:01:19 -08:00
James R. Barlow
8c0009c5c8 Make pdfminer.six optional
Mainly since the current release of pdfminer.six lacks a sdist, blocking
homebrew packaging. Also in case other distros don't accept pdfminer.six.
2018-12-31 01:08:43 -08:00
James R. Barlow
8b90c45437 Drop support for Tesseract 3 2018-12-30 00:47:12 -08:00
James R. Barlow
72b920eb16 Drop support for Python 3.5 2018-12-30 00:23:26 -08:00
James R. Barlow
ab632f57cd v7.4.0 release notes 2018-12-15 15:27:23 -08:00
James R. Barlow
b973208137 Require pikepdf 0.9.1 2018-12-15 14:23:10 -08:00
James R. Barlow
5a7a8e573b Require pikepdf 0.9.0 2018-12-14 23:06:57 -08:00
James R. Barlow
632dab2cc0 Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.

Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.

Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00
James R. Barlow
9593aa4fb9 Merge v7.3.0 development 2018-11-11 01:38:42 -08:00
James R. Barlow
0f5c484b62 Travis: only need to specify chardet because we use pip install --no-deps 2018-11-10 13:57:04 -08:00
James R. Barlow
755b5d87e3 Add missing chardet, implied by pdfminer.six? 2018-11-10 01:50:51 -08:00
James R. Barlow
eed0424390 Update requirements 2018-11-10 00:56:04 -08:00
James R. Barlow
600d31a907 Require pikepdf 0.3.7 2018-10-30 16:22:05 -07:00
James R. Barlow
05aa43c856 Require pdfminer 2018-10-29 12:45:15 -07:00
Stefan Weil
a873278c2a Fix some recommendations from LGTM (#309)
* Fix unreachable code

This fixes an issue reported by LGTM.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Remove unused imports

This fixes several recommendations from LGTM.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-28 13:59:58 -07:00
James R. Barlow
f5807a2053 Require pikepdf 0.3.5 2018-10-21 21:37:15 -07:00
James R. Barlow
5650eba848 Cleanup MANIFEST.in, reorg requirements/*.txt, fix non-Unicode readme 2018-10-10 23:53:08 -07:00
James R. Barlow
29116e1dec Change to README.md 2018-09-19 21:01:24 -07:00
James R. Barlow
eaa324939f Upgrade to pikepdf 0.3.3
Closes #231
2018-09-19 15:30:54 -07:00
James R. Barlow
e0599fe8d7 Require pikepdf 0.3.2 2018-08-24 12:41:43 -07:00