James R. Barlow
979b0bcaed
tesseract: refactor logging
2019-11-05 15:38:09 -08:00
James R. Barlow
3438afaffe
Support pdfminer.six 20191020
v9.0.5
2019-11-04 03:15:59 -08:00
James R. Barlow
681fa039cc
Update release notes; disable Py3.8 test again
2019-11-04 03:00:15 -08:00
James R. Barlow
69e80f1545
docker-compose.test does not seem to be ready for production use
2019-11-04 02:58:57 -08:00
James R. Barlow
983835cce4
docs: add remark about optimizing without OCR
2019-11-04 02:32:29 -08:00
James R. Barlow
6c23b137e2
Docker: relocate dockerfile
2019-11-04 02:27:30 -08:00
James R. Barlow
d656b2b3f2
docs: remove comment about Ubuntu image
...
[ci skip]
2019-11-04 02:08:42 -08:00
James R. Barlow
031b800aac
Docker autotest: fix, maybe?
2019-11-04 02:04:07 -08:00
James R. Barlow
05eb85ee77
Docker: try adding automated test
2019-11-04 01:23:54 -08:00
James R. Barlow
4da5214ca9
Drop support for unpaper 6.1 on Ubuntu 14.04
2019-11-04 00:09:04 -08:00
James R. Barlow
1ee829dd59
Travis: enable Python 3.8 testing
2019-11-04 00:05:18 -08:00
James R. Barlow
99db5d91ae
Fix issue "MANIFEST.in exists" by removing MANIFEST.in
...
MANIFEST.in is always an issue
2019-11-04 00:03:49 -08:00
James R. Barlow
3a4490ee36
Dockerfile: fix jbig2 not copied over
2019-11-03 23:52:08 -08:00
James R. Barlow
a492e3b472
Dockerfile: fix errors are trying to build unneeded cached wheels
2019-11-03 23:51:55 -08:00
James R. Barlow
c3719d3b72
Dockerfile: remove venv from Ubuntu image; tweak reqs
2019-11-03 23:39:40 -08:00
James R. Barlow
ad48fc6415
Remove Alpine Docker image
2019-11-03 22:35:15 -08:00
James R. Barlow
7f8018ffde
Mention that v9.0.4 requires a source install for Py3.8 for now, due to lack of CI availability
v9.0.4
2019-11-03 01:49:36 -08:00
James R. Barlow
80651fe12c
Fix test suite error
2019-10-24 18:17:03 -07:00
James R. Barlow
a58209e895
Disable Py3.8 for now
2019-10-24 18:16:47 -07:00
James R. Barlow
775b958c55
Update release notes
2019-10-24 16:58:39 -07:00
James R. Barlow
cdcdd16865
Require Pillow 6.2.0 based on security vulnerability report in older versions
2019-10-23 12:27:29 -07:00
James R. Barlow
b332d76782
Mention when we default to English and the system locale is not English
...
Closes #337
2019-10-22 01:49:38 -07:00
James R. Barlow
3660007fc8
travis: Python 3.8, osx_image
2019-10-20 04:06:13 -07:00
James R. Barlow
b55d7e57af
Python 3.8 updates
2019-10-20 03:20:54 -07:00
James R. Barlow
6e99e7b346
Use lstm_use_matrix for --user-words,patterns
2019-10-20 00:49:11 -07:00
James R. Barlow
4d26867dee
Delinting
2019-09-20 17:17:11 -07:00
James R. Barlow
78e8bf9cbf
Use at most 3 Tesseract threads
...
Based on a user suggestion and
tesseract-ocr/tesseract#2611 , I reviewed thread limits and found that
thread limit of 3 is still beneficial, but not 4.
> time env OMP_THREAD_LIMIT=2 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
116.67user 1.67system 1:26.26elapsed 137%CPU (0avgtext+0avgdata 356752maxresident)k
2213inputs+0outputs (18major+131059minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=3 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
136.89user 1.63system 1:19.56elapsed 174%CPU (0avgtext+0avgdata 356784maxresident)k
821inputs+0outputs (0major+131080minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=4 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
161.31user 1.51system 1:18.80elapsed 206%CPU (0avgtext+0avgdata 356632maxresident)k
8477inputs+0outputs (12major+131074minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=8 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
160.30user 1.62system 1:18.01elapsed 207%CPU (0avgtext+0avgdata 356640maxresident)k
821inputs+0outputs (0major+131078minor)pagefaults 0swaps
2019-09-20 17:12:36 -07:00
James R. Barlow
de61530d4d
docs: fix intermediate file list for v9
2019-09-20 17:02:35 -07:00
James R. Barlow
c149f860b5
Add contributing guide
2019-09-20 17:02:22 -07:00
James R. Barlow
68c852acec
Remove test_tesseract_config_invalid from suite
...
Also causes problems in CI
2019-09-18 13:28:02 -07:00
James R. Barlow
a8565bac6e
Fix any False in the ocrmypdf.ocr() API being set to True
2019-09-15 01:47:31 -07:00
James R. Barlow
6e8b0c3194
Fix py36 test including 37
2019-09-15 01:47:10 -07:00
James R. Barlow
ff860e8362
Fix black settings in pyproject.toml
2019-09-15 01:46:13 -07:00
James R. Barlow
cf4b04c5d1
optimize: work around pikepdf 1.6.3 limitation with indexed ICCbased colorspaces
2019-09-11 12:56:27 -07:00
James R. Barlow
078bc2abe9
pdfa: assume 3 RGB channels always
2019-09-11 12:55:38 -07:00
James R. Barlow
d7b7ca0574
v9.0.3 notes; Remove test_tesseract_config_notfound from suite
v9.0.3
2019-09-05 13:39:43 -07:00
James R. Barlow
17ac9d7a9a
Embed ICC profile in .ps (fixing Ghostscript 9.28 compatibility)
...
Previously we included the
filename, which required Postscript to run with file access enabled. For
security, Ghostscript 9.28 enables ``-dSAFER`` and as such, no longer
permits access to any file by default. This fix is necessary for
compatibility with Ghostscript 9.28.
We use ASCII85 for a slightly more compact representation.
2019-09-05 13:17:26 -07:00
James R. Barlow
a2a197ce4c
v9.0.2 release notes
v9.0.2
2019-09-04 02:34:21 -07:00
James R. Barlow
944d59e5ad
Fix --print-parameters issue when chi_sim is not installed
2019-09-04 01:17:52 -07:00
James R. Barlow
1c3e90a892
optimize: solve monochrome by converting to G4
2019-09-04 00:51:47 -07:00
James R. Barlow
c728836956
Adjust test requirements
2019-09-04 00:50:48 -07:00
James R. Barlow
0d80fab339
Remove restriction on pytest < 5
2019-09-03 23:47:55 -07:00
James R. Barlow
a650caa599
optimize: don't consider 1bpp images for PNG optimization
2019-09-03 23:47:20 -07:00
James R. Barlow
c6caff90a1
optimize: only re-insert pngs after pngquant
...
Previously we attempted to reinsert all PNGs, but it appears to be
unlikely that Leptonica's API is actually capable of optimizing the PNG
before it inserts it.
In any event qpdf has gained image optimization capabilities as well
which we coudld borrow.
2019-09-03 23:46:25 -07:00
James R. Barlow
671c88d3b5
optimize: exclude images with custom Decode tables
2019-09-03 23:37:23 -07:00
James R. Barlow
b2cfaedf91
optimize: Don't reinsert 1bpp images
...
There seems to be version to version inconsistencies between
Leptonica's photometric interpretation of 1bpp images, in
particular commit a0692307 introduces a change to force transcoding
in this situation.
However, I never entirely got to the bottom of where the problem
is, and in any event 1bpp images are probably better optimized
by JBIG2 than pngquant, so we're going to stop running them through
pngquant.
2019-09-03 23:26:13 -07:00
James R. Barlow
19ba3ae011
Allow test_german to xfail if deu language is not installed
2019-09-03 17:38:54 -07:00
James R. Barlow
feff1e38bb
Use context managers to ensure Pillow images are closed
2019-09-03 17:19:12 -07:00
James R. Barlow
c8d6ea6b10
Fix tests broken by --print-parameters change
2019-09-03 17:17:24 -07:00
James R. Barlow
b0d9775343
Attempt to resolve black-inversion issue
2019-08-31 01:25:36 -07:00