2895 Commits

Author SHA1 Message Date
James R. Barlow
979b0bcaed tesseract: refactor logging 2019-11-05 15:38:09 -08:00
James R. Barlow
3438afaffe Support pdfminer.six 20191020 v9.0.5 2019-11-04 03:15:59 -08:00
James R. Barlow
681fa039cc Update release notes; disable Py3.8 test again 2019-11-04 03:00:15 -08:00
James R. Barlow
69e80f1545 docker-compose.test does not seem to be ready for production use 2019-11-04 02:58:57 -08:00
James R. Barlow
983835cce4 docs: add remark about optimizing without OCR 2019-11-04 02:32:29 -08:00
James R. Barlow
6c23b137e2 Docker: relocate dockerfile 2019-11-04 02:27:30 -08:00
James R. Barlow
d656b2b3f2 docs: remove comment about Ubuntu image
[ci skip]
2019-11-04 02:08:42 -08:00
James R. Barlow
031b800aac Docker autotest: fix, maybe? 2019-11-04 02:04:07 -08:00
James R. Barlow
05eb85ee77 Docker: try adding automated test 2019-11-04 01:23:54 -08:00
James R. Barlow
4da5214ca9 Drop support for unpaper 6.1 on Ubuntu 14.04 2019-11-04 00:09:04 -08:00
James R. Barlow
1ee829dd59 Travis: enable Python 3.8 testing 2019-11-04 00:05:18 -08:00
James R. Barlow
99db5d91ae Fix issue "MANIFEST.in exists" by removing MANIFEST.in
MANIFEST.in is always an issue
2019-11-04 00:03:49 -08:00
James R. Barlow
3a4490ee36 Dockerfile: fix jbig2 not copied over 2019-11-03 23:52:08 -08:00
James R. Barlow
a492e3b472 Dockerfile: fix errors are trying to build unneeded cached wheels 2019-11-03 23:51:55 -08:00
James R. Barlow
c3719d3b72 Dockerfile: remove venv from Ubuntu image; tweak reqs 2019-11-03 23:39:40 -08:00
James R. Barlow
ad48fc6415 Remove Alpine Docker image 2019-11-03 22:35:15 -08:00
James R. Barlow
7f8018ffde Mention that v9.0.4 requires a source install for Py3.8 for now, due to lack of CI availability v9.0.4 2019-11-03 01:49:36 -08:00
James R. Barlow
80651fe12c Fix test suite error 2019-10-24 18:17:03 -07:00
James R. Barlow
a58209e895 Disable Py3.8 for now 2019-10-24 18:16:47 -07:00
James R. Barlow
775b958c55 Update release notes 2019-10-24 16:58:39 -07:00
James R. Barlow
cdcdd16865 Require Pillow 6.2.0 based on security vulnerability report in older versions 2019-10-23 12:27:29 -07:00
James R. Barlow
b332d76782 Mention when we default to English and the system locale is not English
Closes #337
2019-10-22 01:49:38 -07:00
James R. Barlow
3660007fc8 travis: Python 3.8, osx_image 2019-10-20 04:06:13 -07:00
James R. Barlow
b55d7e57af Python 3.8 updates 2019-10-20 03:20:54 -07:00
James R. Barlow
6e99e7b346 Use lstm_use_matrix for --user-words,patterns 2019-10-20 00:49:11 -07:00
James R. Barlow
4d26867dee Delinting 2019-09-20 17:17:11 -07:00
James R. Barlow
78e8bf9cbf Use at most 3 Tesseract threads
Based on a user suggestion and
tesseract-ocr/tesseract#2611, I reviewed thread limits and found that
thread limit of 3 is still beneficial, but not 4.

> time env OMP_THREAD_LIMIT=2 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
116.67user 1.67system 1:26.26elapsed 137%CPU (0avgtext+0avgdata 356752maxresident)k
2213inputs+0outputs (18major+131059minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=3 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
136.89user 1.63system 1:19.56elapsed 174%CPU (0avgtext+0avgdata 356784maxresident)k
821inputs+0outputs (0major+131080minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=4 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
161.31user 1.51system 1:18.80elapsed 206%CPU (0avgtext+0avgdata 356632maxresident)k
8477inputs+0outputs (12major+131074minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=8 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
160.30user 1.62system 1:18.01elapsed 207%CPU (0avgtext+0avgdata 356640maxresident)k
821inputs+0outputs (0major+131078minor)pagefaults 0swaps
2019-09-20 17:12:36 -07:00
James R. Barlow
de61530d4d docs: fix intermediate file list for v9 2019-09-20 17:02:35 -07:00
James R. Barlow
c149f860b5 Add contributing guide 2019-09-20 17:02:22 -07:00
James R. Barlow
68c852acec Remove test_tesseract_config_invalid from suite
Also causes problems in CI
2019-09-18 13:28:02 -07:00
James R. Barlow
a8565bac6e Fix any False in the ocrmypdf.ocr() API being set to True 2019-09-15 01:47:31 -07:00
James R. Barlow
6e8b0c3194 Fix py36 test including 37 2019-09-15 01:47:10 -07:00
James R. Barlow
ff860e8362 Fix black settings in pyproject.toml 2019-09-15 01:46:13 -07:00
James R. Barlow
cf4b04c5d1 optimize: work around pikepdf 1.6.3 limitation with indexed ICCbased colorspaces 2019-09-11 12:56:27 -07:00
James R. Barlow
078bc2abe9 pdfa: assume 3 RGB channels always 2019-09-11 12:55:38 -07:00
James R. Barlow
d7b7ca0574 v9.0.3 notes; Remove test_tesseract_config_notfound from suite v9.0.3 2019-09-05 13:39:43 -07:00
James R. Barlow
17ac9d7a9a Embed ICC profile in .ps (fixing Ghostscript 9.28 compatibility)
Previously we included the
   filename, which required Postscript to run with file access enabled. For
   security, Ghostscript 9.28 enables ``-dSAFER`` and as such, no longer
   permits access to any file by default. This fix is necessary for
   compatibility with Ghostscript 9.28.

We use ASCII85 for a slightly more compact representation.
2019-09-05 13:17:26 -07:00
James R. Barlow
a2a197ce4c v9.0.2 release notes v9.0.2 2019-09-04 02:34:21 -07:00
James R. Barlow
944d59e5ad Fix --print-parameters issue when chi_sim is not installed 2019-09-04 01:17:52 -07:00
James R. Barlow
1c3e90a892 optimize: solve monochrome by converting to G4 2019-09-04 00:51:47 -07:00
James R. Barlow
c728836956 Adjust test requirements 2019-09-04 00:50:48 -07:00
James R. Barlow
0d80fab339 Remove restriction on pytest < 5 2019-09-03 23:47:55 -07:00
James R. Barlow
a650caa599 optimize: don't consider 1bpp images for PNG optimization 2019-09-03 23:47:20 -07:00
James R. Barlow
c6caff90a1 optimize: only re-insert pngs after pngquant
Previously we attempted to reinsert all PNGs, but it appears to be
unlikely that Leptonica's API is actually capable of optimizing the PNG
before it inserts it.

In any event qpdf has gained image optimization capabilities as well
which we coudld borrow.
2019-09-03 23:46:25 -07:00
James R. Barlow
671c88d3b5 optimize: exclude images with custom Decode tables 2019-09-03 23:37:23 -07:00
James R. Barlow
b2cfaedf91 optimize: Don't reinsert 1bpp images
There seems to be version to version inconsistencies between
Leptonica's photometric interpretation of 1bpp images, in
particular commit a0692307 introduces a change to force transcoding
in this situation.

However, I never entirely got to the bottom of where the problem
is, and in any event 1bpp images are probably better optimized
by JBIG2 than pngquant, so we're going to stop running them through
pngquant.
2019-09-03 23:26:13 -07:00
James R. Barlow
19ba3ae011 Allow test_german to xfail if deu language is not installed 2019-09-03 17:38:54 -07:00
James R. Barlow
feff1e38bb Use context managers to ensure Pillow images are closed 2019-09-03 17:19:12 -07:00
James R. Barlow
c8d6ea6b10 Fix tests broken by --print-parameters change 2019-09-03 17:17:24 -07:00
James R. Barlow
b0d9775343 Attempt to resolve black-inversion issue 2019-08-31 01:25:36 -07:00