James R. Barlow
974979b0a0
Merge branch 'feature/optimization-fixes'
2019-03-03 15:00:20 -08:00
James R. Barlow
66586bdaab
optimize: Disable jpg->png migration
...
Needs more testing before release
2019-03-03 14:59:59 -08:00
James R. Barlow
01d2ea309f
Fix Predictor name and photometric flip
2019-03-03 14:57:15 -08:00
James R. Barlow
e918480351
v8.2.0 release notes
2019-03-03 14:15:20 -08:00
James R. Barlow
52fd84fa95
Remove debug message
2019-03-03 13:31:10 -08:00
James R. Barlow
2c56b0935c
docs: minor
2019-03-03 03:28:17 -08:00
James R. Barlow
4f69ace868
optimize: fix all JBIG2 images binned on last page
...
During some past refactor it appears we now end up treating
all JBIG2 images as if they appeared on the last page in the
file. This bug had no visual side ffects but probably led to
suboptimal JBIG2 encoding.
2019-03-03 03:28:17 -08:00
James R. Barlow
497c531112
optimize: update comments
2019-03-03 03:28:17 -08:00
James R. Barlow
b27b92fbf3
optimize: on aggressive settings try JPG to PNG transcoding
...
If the color count of an image is low such as when black and white
documents are scanned in color, PNG with lossy quantization may
produce a superior encoding to JPEG. This is expensive to test however.
2019-03-03 03:28:17 -08:00
James R. Barlow
2e6ba2df8c
optimize: fix recoding of PNGs
...
Previously we opened pngquant-compressed PNGs with transcoding
because the transcode free function in Leptonica didn't seem to
work. This mean Leptonica may have thrown away the hard of
pngquant if didn't understand the encoding.
This change resolves the issue and allows us to open PNG encoded
data and insert it into a PDF without transcoding. Should improve
encoding quality.
2019-03-03 03:28:17 -08:00
James R. Barlow
67a405c6b7
Move install-time external program checks out of setup.py
...
We did runtime tests for several of them anyway, and it's better to do
at runtime since config may change after installation.
2019-03-03 03:26:56 -08:00
James R. Barlow
58e6663806
Update test cache for french->german change
2019-03-03 03:23:59 -08:00
James R. Barlow
602570fcf9
Update requirements
2019-03-03 02:27:56 -08:00
James R. Barlow
691f8ce254
Docs: reorganize for new docker-alpine image
2019-03-01 23:15:32 -08:00
James R. Barlow
22812e74b9
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2019-02-26 13:01:59 -08:00
Martin Wind
9d824e723d
Add Dockerfile based on alpine:3.9 ( #354 )
...
* Do not exclude .git from docker build
* Use multi-stage builds to keep the image size down
* Copy project files to get the test suite.
* Add webservice
* Add tesseract language data for German and Chinese Simplified
2019-02-26 13:01:38 -08:00
James R. Barlow
5dad800d85
Add version to build-system declaration
2019-02-26 12:58:44 -08:00
James R. Barlow
56a56a4dcb
docs: avoid importing ocrmypdf
2019-02-26 12:57:50 -08:00
James R. Barlow
3f1d9ef99c
Fix tests for move to Alpine dockerfile
2019-02-26 12:30:21 -08:00
James R. Barlow
92c8a5885e
Declare build system in pyproject.toml
2019-02-26 12:23:33 -08:00
James R. Barlow
7749d14252
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2019-02-24 01:56:47 -08:00
Julien Ma
9b92af5aed
README: install other language packs on macOS ( #352 )
...
The default homebrew formula installs only the English language pack.
Another brew formula exists to install all other language packs.
This makes it easier than having to do the whole install manually.
2019-02-19 10:13:36 -08:00
James R. Barlow
0bf26b03ae
optimize: Modernize pikepdf usage
2019-02-16 14:03:10 -08:00
James R. Barlow
e2847ea4c3
v8.1.0 release notes
v8.1.0
2019-02-10 02:10:48 -08:00
James R. Barlow
19e35db2b7
Fix issue when weave handoff occurs with no OCR font present
...
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.
Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5
Fix exception on traversing corrupt ToC entries
2019-02-10 00:50:21 -08:00
James R. Barlow
42c2925f9d
Activate black precommit
2019-02-08 14:09:08 -08:00
James R. Barlow
933f0b8f9b
docs: more unpaper details
2019-02-08 13:05:09 -08:00
James R. Barlow
03ab5a8ee2
If --tesseract-timeout 0, say nothing when we time out
...
This is our "don't actually OCR" mode. No need to mention it.
2019-02-08 13:04:48 -08:00
James R. Barlow
4f06920224
Be os.nice()-r
2019-02-07 17:24:47 -08:00
James R. Barlow
a733b09623
webservice: add an optional config and larger upload limit
2019-02-07 17:24:17 -08:00
James R. Barlow
5483dacf52
Fuzz
2019-02-07 17:09:47 -08:00
James R. Barlow
ae7844ad88
--clean-final implies --clean
...
It's never made sense to leave it out before; might as well introduce it.
2019-02-07 17:08:08 -08:00
James R. Barlow
a6e7485da6
docs: --unpaper-args
2019-02-07 17:06:51 -08:00
James R. Barlow
3bcc6d6121
Merge 'feature/unpaper-args'
2019-02-07 17:06:28 -08:00
James R. Barlow
9fe067bbd9
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2019-02-07 16:31:06 -08:00
James R. Barlow
f095e91cb4
unpaper-args: add test case and harden feature
2019-02-07 16:21:02 -08:00
Charles Forcey
66c8d4b47a
Adjust the docker pull command for webservice ( #346 )
...
Not completely sure this is correct, but I think `docker pull jbarlow83/ocrmypdf-webservice` might be the correct command for getting the web service version. It installs as expected:
```
docker pull jbarlow83/ocrmypdf-webservice
Using default tag: latest
latest: Pulling from jbarlow83/ocrmypdf-webservice
38e2e6cd5626: Already exists
705054bc3f5b: Already exists
c7051e069564: Already exists
7308e914506c: Already exists
3977c3cd82d1: Already exists
ec01b9573956: Already exists
b508b5192a3c: Already exists
ace6e737fffb: Already exists
0a453ee84e11: Already exists
f8cb8b66151b: Already exists
f53c3b27b23f: Already exists
22df51ea5473: Already exists
e38d932f9f30: Already exists
b9d3c1d5b53b: Already exists
68be2088ada3: Already exists
8b17945ab41b: Pull complete
59c4aae491bd: Pull complete
19dce698a07e: Pull complete
Digest: sha256:0cc9433d490c9a65389403757bf6081a30bcd248055340a8789c23d9cdf9ac8a
Status: Downloaded newer image for jbarlow83/ocrmypdf-webservice:latest
```
2019-01-25 10:41:42 -08:00
James R. Barlow
721489a06c
docs: remove reference to --skip-repair since the argument was removed
2019-01-18 05:44:11 -08:00
James R. Barlow
9a4493f211
Add --unpaper-args
...
Needs test code and stricter validation
2019-01-18 05:33:28 -08:00
James R. Barlow
edb4d6c586
docs: Clarify ArchLinux edition is in AUR
2019-01-18 05:29:37 -08:00
James R. Barlow
b8cd3acd9e
v8.0.1 notes
v8.0.1
2019-01-17 00:57:28 -08:00
James R. Barlow
03779e33da
docs: Update some install procedures for v8 changes
...
[ci skip]
2019-01-12 00:33:36 -08:00
James R. Barlow
c466483e82
docs: Explain intermediate files
2019-01-11 14:52:05 -08:00
James R. Barlow
e3a58219d1
Ensure XObjects with no subtype don't cause an exception
...
Closes #325
2019-01-08 16:46:08 -08:00
James R. Barlow
72337094ca
v8.0.0 release notes
v8.0.0
2019-01-05 23:35:47 -08:00
James R. Barlow
f472587d22
Bump pikepdf version, point to release notes
2019-01-05 16:48:13 -08:00
James R. Barlow
f34b3015b2
Prevent Ghostscript from generating invalid XMP metadata
...
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
089ece2715
use pikepdf 0.10.2
2019-01-03 12:08:43 -08:00
James R. Barlow
6438465e3f
Add fish completions
2019-01-02 17:08:30 -08:00