2895 Commits

Author SHA1 Message Date
James R. Barlow
974979b0a0 Merge branch 'feature/optimization-fixes' 2019-03-03 15:00:20 -08:00
James R. Barlow
66586bdaab optimize: Disable jpg->png migration
Needs more testing before release
2019-03-03 14:59:59 -08:00
James R. Barlow
01d2ea309f Fix Predictor name and photometric flip 2019-03-03 14:57:15 -08:00
James R. Barlow
e918480351 v8.2.0 release notes 2019-03-03 14:15:20 -08:00
James R. Barlow
52fd84fa95 Remove debug message 2019-03-03 13:31:10 -08:00
James R. Barlow
2c56b0935c docs: minor 2019-03-03 03:28:17 -08:00
James R. Barlow
4f69ace868 optimize: fix all JBIG2 images binned on last page
During some past refactor it appears we now end up treating
all JBIG2 images as if they appeared on the last page in the
file. This bug had no visual side ffects but probably led to
suboptimal JBIG2 encoding.
2019-03-03 03:28:17 -08:00
James R. Barlow
497c531112 optimize: update comments 2019-03-03 03:28:17 -08:00
James R. Barlow
b27b92fbf3 optimize: on aggressive settings try JPG to PNG transcoding
If the color count of an image is low such as when black and white
documents are scanned in color, PNG with lossy quantization may
produce a superior encoding to JPEG. This is expensive to test however.
2019-03-03 03:28:17 -08:00
James R. Barlow
2e6ba2df8c optimize: fix recoding of PNGs
Previously we opened pngquant-compressed PNGs with transcoding
because the transcode free function in Leptonica didn't seem to
work. This mean Leptonica may have thrown away the hard of
pngquant if didn't understand the encoding.

This change resolves the issue and allows us to open PNG encoded
data and insert it into a PDF without transcoding. Should improve
encoding quality.
2019-03-03 03:28:17 -08:00
James R. Barlow
67a405c6b7 Move install-time external program checks out of setup.py
We did runtime tests for several of them anyway, and it's better to do
at runtime since config may change after installation.
2019-03-03 03:26:56 -08:00
James R. Barlow
58e6663806 Update test cache for french->german change 2019-03-03 03:23:59 -08:00
James R. Barlow
602570fcf9 Update requirements 2019-03-03 02:27:56 -08:00
James R. Barlow
691f8ce254 Docs: reorganize for new docker-alpine image 2019-03-01 23:15:32 -08:00
James R. Barlow
22812e74b9 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2019-02-26 13:01:59 -08:00
Martin Wind
9d824e723d Add Dockerfile based on alpine:3.9 (#354)
* Do not exclude .git from docker build

* Use multi-stage builds to keep the image size down

* Copy project files to get the test suite.

* Add webservice

* Add tesseract language data for German and Chinese Simplified
2019-02-26 13:01:38 -08:00
James R. Barlow
5dad800d85 Add version to build-system declaration 2019-02-26 12:58:44 -08:00
James R. Barlow
56a56a4dcb docs: avoid importing ocrmypdf 2019-02-26 12:57:50 -08:00
James R. Barlow
3f1d9ef99c Fix tests for move to Alpine dockerfile 2019-02-26 12:30:21 -08:00
James R. Barlow
92c8a5885e Declare build system in pyproject.toml 2019-02-26 12:23:33 -08:00
James R. Barlow
7749d14252 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2019-02-24 01:56:47 -08:00
Julien Ma
9b92af5aed README: install other language packs on macOS (#352)
The default homebrew formula installs only the English language pack.
Another brew formula exists to install all other language packs.
This makes it easier than having to do the whole install manually.
2019-02-19 10:13:36 -08:00
James R. Barlow
0bf26b03ae optimize: Modernize pikepdf usage 2019-02-16 14:03:10 -08:00
James R. Barlow
e2847ea4c3 v8.1.0 release notes v8.1.0 2019-02-10 02:10:48 -08:00
James R. Barlow
19e35db2b7 Fix issue when weave handoff occurs with no OCR font present
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.

Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5 Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00
James R. Barlow
42c2925f9d Activate black precommit 2019-02-08 14:09:08 -08:00
James R. Barlow
933f0b8f9b docs: more unpaper details 2019-02-08 13:05:09 -08:00
James R. Barlow
03ab5a8ee2 If --tesseract-timeout 0, say nothing when we time out
This is our "don't actually OCR" mode. No need to mention it.
2019-02-08 13:04:48 -08:00
James R. Barlow
4f06920224 Be os.nice()-r 2019-02-07 17:24:47 -08:00
James R. Barlow
a733b09623 webservice: add an optional config and larger upload limit 2019-02-07 17:24:17 -08:00
James R. Barlow
5483dacf52 Fuzz 2019-02-07 17:09:47 -08:00
James R. Barlow
ae7844ad88 --clean-final implies --clean
It's never made sense to leave it out before; might as well introduce it.
2019-02-07 17:08:08 -08:00
James R. Barlow
a6e7485da6 docs: --unpaper-args 2019-02-07 17:06:51 -08:00
James R. Barlow
3bcc6d6121 Merge 'feature/unpaper-args' 2019-02-07 17:06:28 -08:00
James R. Barlow
9fe067bbd9 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2019-02-07 16:31:06 -08:00
James R. Barlow
f095e91cb4 unpaper-args: add test case and harden feature 2019-02-07 16:21:02 -08:00
Charles Forcey
66c8d4b47a Adjust the docker pull command for webservice (#346)
Not completely sure this is correct, but I think `docker pull jbarlow83/ocrmypdf-webservice` might be the correct command for getting the web service version.  It installs as expected:

```
docker pull jbarlow83/ocrmypdf-webservice
Using default tag: latest
latest: Pulling from jbarlow83/ocrmypdf-webservice
38e2e6cd5626: Already exists 
705054bc3f5b: Already exists 
c7051e069564: Already exists 
7308e914506c: Already exists 
3977c3cd82d1: Already exists 
ec01b9573956: Already exists 
b508b5192a3c: Already exists 
ace6e737fffb: Already exists 
0a453ee84e11: Already exists 
f8cb8b66151b: Already exists 
f53c3b27b23f: Already exists 
22df51ea5473: Already exists 
e38d932f9f30: Already exists 
b9d3c1d5b53b: Already exists 
68be2088ada3: Already exists 
8b17945ab41b: Pull complete 
59c4aae491bd: Pull complete 
19dce698a07e: Pull complete 
Digest: sha256:0cc9433d490c9a65389403757bf6081a30bcd248055340a8789c23d9cdf9ac8a
Status: Downloaded newer image for jbarlow83/ocrmypdf-webservice:latest
```
2019-01-25 10:41:42 -08:00
James R. Barlow
721489a06c docs: remove reference to --skip-repair since the argument was removed 2019-01-18 05:44:11 -08:00
James R. Barlow
9a4493f211 Add --unpaper-args
Needs test code and stricter validation
2019-01-18 05:33:28 -08:00
James R. Barlow
edb4d6c586 docs: Clarify ArchLinux edition is in AUR 2019-01-18 05:29:37 -08:00
James R. Barlow
b8cd3acd9e v8.0.1 notes v8.0.1 2019-01-17 00:57:28 -08:00
James R. Barlow
03779e33da docs: Update some install procedures for v8 changes
[ci skip]
2019-01-12 00:33:36 -08:00
James R. Barlow
c466483e82 docs: Explain intermediate files 2019-01-11 14:52:05 -08:00
James R. Barlow
e3a58219d1 Ensure XObjects with no subtype don't cause an exception
Closes #325
2019-01-08 16:46:08 -08:00
James R. Barlow
72337094ca v8.0.0 release notes v8.0.0 2019-01-05 23:35:47 -08:00
James R. Barlow
f472587d22 Bump pikepdf version, point to release notes 2019-01-05 16:48:13 -08:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
089ece2715 use pikepdf 0.10.2 2019-01-03 12:08:43 -08:00
James R. Barlow
6438465e3f Add fish completions 2019-01-02 17:08:30 -08:00