2676 Commits

Author SHA1 Message Date
James R. Barlow
92c8a5885e Declare build system in pyproject.toml 2019-02-26 12:23:33 -08:00
James R. Barlow
7749d14252 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2019-02-24 01:56:47 -08:00
Julien Ma
9b92af5aed README: install other language packs on macOS (#352)
The default homebrew formula installs only the English language pack.
Another brew formula exists to install all other language packs.
This makes it easier than having to do the whole install manually.
2019-02-19 10:13:36 -08:00
James R. Barlow
0bf26b03ae optimize: Modernize pikepdf usage 2019-02-16 14:03:10 -08:00
James R. Barlow
e2847ea4c3 v8.1.0 release notes v8.1.0 2019-02-10 02:10:48 -08:00
James R. Barlow
19e35db2b7 Fix issue when weave handoff occurs with no OCR font present
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.

Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5 Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00
James R. Barlow
42c2925f9d Activate black precommit 2019-02-08 14:09:08 -08:00
James R. Barlow
933f0b8f9b docs: more unpaper details 2019-02-08 13:05:09 -08:00
James R. Barlow
03ab5a8ee2 If --tesseract-timeout 0, say nothing when we time out
This is our "don't actually OCR" mode. No need to mention it.
2019-02-08 13:04:48 -08:00
James R. Barlow
4f06920224 Be os.nice()-r 2019-02-07 17:24:47 -08:00
James R. Barlow
a733b09623 webservice: add an optional config and larger upload limit 2019-02-07 17:24:17 -08:00
James R. Barlow
5483dacf52 Fuzz 2019-02-07 17:09:47 -08:00
James R. Barlow
ae7844ad88 --clean-final implies --clean
It's never made sense to leave it out before; might as well introduce it.
2019-02-07 17:08:08 -08:00
James R. Barlow
a6e7485da6 docs: --unpaper-args 2019-02-07 17:06:51 -08:00
James R. Barlow
3bcc6d6121 Merge 'feature/unpaper-args' 2019-02-07 17:06:28 -08:00
James R. Barlow
9fe067bbd9 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2019-02-07 16:31:06 -08:00
James R. Barlow
f095e91cb4 unpaper-args: add test case and harden feature 2019-02-07 16:21:02 -08:00
Charles Forcey
66c8d4b47a Adjust the docker pull command for webservice (#346)
Not completely sure this is correct, but I think `docker pull jbarlow83/ocrmypdf-webservice` might be the correct command for getting the web service version.  It installs as expected:

```
docker pull jbarlow83/ocrmypdf-webservice
Using default tag: latest
latest: Pulling from jbarlow83/ocrmypdf-webservice
38e2e6cd5626: Already exists 
705054bc3f5b: Already exists 
c7051e069564: Already exists 
7308e914506c: Already exists 
3977c3cd82d1: Already exists 
ec01b9573956: Already exists 
b508b5192a3c: Already exists 
ace6e737fffb: Already exists 
0a453ee84e11: Already exists 
f8cb8b66151b: Already exists 
f53c3b27b23f: Already exists 
22df51ea5473: Already exists 
e38d932f9f30: Already exists 
b9d3c1d5b53b: Already exists 
68be2088ada3: Already exists 
8b17945ab41b: Pull complete 
59c4aae491bd: Pull complete 
19dce698a07e: Pull complete 
Digest: sha256:0cc9433d490c9a65389403757bf6081a30bcd248055340a8789c23d9cdf9ac8a
Status: Downloaded newer image for jbarlow83/ocrmypdf-webservice:latest
```
2019-01-25 10:41:42 -08:00
James R. Barlow
721489a06c docs: remove reference to --skip-repair since the argument was removed 2019-01-18 05:44:11 -08:00
James R. Barlow
9a4493f211 Add --unpaper-args
Needs test code and stricter validation
2019-01-18 05:33:28 -08:00
James R. Barlow
edb4d6c586 docs: Clarify ArchLinux edition is in AUR 2019-01-18 05:29:37 -08:00
James R. Barlow
b8cd3acd9e v8.0.1 notes v8.0.1 2019-01-17 00:57:28 -08:00
James R. Barlow
03779e33da docs: Update some install procedures for v8 changes
[ci skip]
2019-01-12 00:33:36 -08:00
James R. Barlow
c466483e82 docs: Explain intermediate files 2019-01-11 14:52:05 -08:00
James R. Barlow
e3a58219d1 Ensure XObjects with no subtype don't cause an exception
Closes #325
2019-01-08 16:46:08 -08:00
James R. Barlow
72337094ca v8.0.0 release notes v8.0.0 2019-01-05 23:35:47 -08:00
James R. Barlow
f472587d22 Bump pikepdf version, point to release notes 2019-01-05 16:48:13 -08:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
089ece2715 use pikepdf 0.10.2 2019-01-03 12:08:43 -08:00
James R. Barlow
6438465e3f Add fish completions 2019-01-02 17:08:30 -08:00
James R. Barlow
7d330afd81 Delinting 2019-01-02 13:34:45 -08:00
James R. Barlow
68fbd9fcc9 pikepdf: version bump 2018-12-31 15:37:31 -08:00
James R. Barlow
c771938907 Convert to f-strings where it makes sense 2018-12-31 15:01:19 -08:00
James R. Barlow
c2a947acf4 travis: fix 2018-12-31 01:18:30 -08:00
James R. Barlow
8c0009c5c8 Make pdfminer.six optional
Mainly since the current release of pdfminer.six lacks a sdist, blocking
homebrew packaging. Also in case other distros don't accept pdfminer.six.
2018-12-31 01:08:43 -08:00
James R. Barlow
cfc5cdf47d pdfa: remove a pile of deprecated code
It's now handled in pikepdf.
2018-12-31 00:05:13 -08:00
James R. Barlow
05152a8af9 Remove always-false Tess v3 tests 2018-12-30 02:01:05 -08:00
James R. Barlow
0880b16491 Sort imports with isort 2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce Reformat with black 2018-12-30 01:27:49 -08:00
James R. Barlow
80bd7de580 Generate test cache 2018-12-30 01:02:37 -08:00
James R. Barlow
8b90c45437 Drop support for Tesseract 3 2018-12-30 00:47:12 -08:00
James R. Barlow
72b920eb16 Drop support for Python 3.5 2018-12-30 00:23:26 -08:00
James R. Barlow
b4a51907d6 Detect when metadata is dropped during PDF/A conversion 2018-12-30 00:13:25 -08:00
James R. Barlow
1ca1221432 leptonica.py: Fix exception on certain types of barcode failures
Closes #322
2018-12-19 17:23:23 -08:00
James R. Barlow
40b72b0fa8 v7.4.1 release notes 2018-12-19 16:41:09 -08:00
James R. Barlow
0e55b4ad52 Travis: remove Brewfile 2018-12-19 16:40:48 -08:00
James R. Barlow
7b4f5a8fc4 docs: try to fix readthedocs
[ci skip]
2018-12-19 15:30:07 -08:00
James R. Barlow
9261a38493 Readme: more media 2018-12-19 15:27:54 -08:00
James R. Barlow
cc8ff318ed New issue template 2018-12-19 15:27:44 -08:00