2676 Commits

Author SHA1 Message Date
James R. Barlow
775b958c55 Update release notes 2019-10-24 16:58:39 -07:00
James R. Barlow
cdcdd16865 Require Pillow 6.2.0 based on security vulnerability report in older versions 2019-10-23 12:27:29 -07:00
James R. Barlow
b332d76782 Mention when we default to English and the system locale is not English
Closes #337
2019-10-22 01:49:38 -07:00
James R. Barlow
3660007fc8 travis: Python 3.8, osx_image 2019-10-20 04:06:13 -07:00
James R. Barlow
b55d7e57af Python 3.8 updates 2019-10-20 03:20:54 -07:00
James R. Barlow
6e99e7b346 Use lstm_use_matrix for --user-words,patterns 2019-10-20 00:49:11 -07:00
James R. Barlow
4d26867dee Delinting 2019-09-20 17:17:11 -07:00
James R. Barlow
78e8bf9cbf Use at most 3 Tesseract threads
Based on a user suggestion and
tesseract-ocr/tesseract#2611, I reviewed thread limits and found that
thread limit of 3 is still beneficial, but not 4.

> time env OMP_THREAD_LIMIT=2 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
116.67user 1.67system 1:26.26elapsed 137%CPU (0avgtext+0avgdata 356752maxresident)k
2213inputs+0outputs (18major+131059minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=3 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
136.89user 1.63system 1:19.56elapsed 174%CPU (0avgtext+0avgdata 356784maxresident)k
821inputs+0outputs (0major+131080minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=4 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
161.31user 1.51system 1:18.80elapsed 206%CPU (0avgtext+0avgdata 356632maxresident)k
8477inputs+0outputs (12major+131074minor)pagefaults 0swaps
> time env OMP_THREAD_LIMIT=8 tesseract omp4.png stdout >/dev/null
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 143
160.30user 1.62system 1:18.01elapsed 207%CPU (0avgtext+0avgdata 356640maxresident)k
821inputs+0outputs (0major+131078minor)pagefaults 0swaps
2019-09-20 17:12:36 -07:00
James R. Barlow
de61530d4d docs: fix intermediate file list for v9 2019-09-20 17:02:35 -07:00
James R. Barlow
c149f860b5 Add contributing guide 2019-09-20 17:02:22 -07:00
James R. Barlow
68c852acec Remove test_tesseract_config_invalid from suite
Also causes problems in CI
2019-09-18 13:28:02 -07:00
James R. Barlow
a8565bac6e Fix any False in the ocrmypdf.ocr() API being set to True 2019-09-15 01:47:31 -07:00
James R. Barlow
6e8b0c3194 Fix py36 test including 37 2019-09-15 01:47:10 -07:00
James R. Barlow
ff860e8362 Fix black settings in pyproject.toml 2019-09-15 01:46:13 -07:00
James R. Barlow
cf4b04c5d1 optimize: work around pikepdf 1.6.3 limitation with indexed ICCbased colorspaces 2019-09-11 12:56:27 -07:00
James R. Barlow
078bc2abe9 pdfa: assume 3 RGB channels always 2019-09-11 12:55:38 -07:00
James R. Barlow
d7b7ca0574 v9.0.3 notes; Remove test_tesseract_config_notfound from suite v9.0.3 2019-09-05 13:39:43 -07:00
James R. Barlow
17ac9d7a9a Embed ICC profile in .ps (fixing Ghostscript 9.28 compatibility)
Previously we included the
   filename, which required Postscript to run with file access enabled. For
   security, Ghostscript 9.28 enables ``-dSAFER`` and as such, no longer
   permits access to any file by default. This fix is necessary for
   compatibility with Ghostscript 9.28.

We use ASCII85 for a slightly more compact representation.
2019-09-05 13:17:26 -07:00
James R. Barlow
a2a197ce4c v9.0.2 release notes v9.0.2 2019-09-04 02:34:21 -07:00
James R. Barlow
944d59e5ad Fix --print-parameters issue when chi_sim is not installed 2019-09-04 01:17:52 -07:00
James R. Barlow
1c3e90a892 optimize: solve monochrome by converting to G4 2019-09-04 00:51:47 -07:00
James R. Barlow
c728836956 Adjust test requirements 2019-09-04 00:50:48 -07:00
James R. Barlow
0d80fab339 Remove restriction on pytest < 5 2019-09-03 23:47:55 -07:00
James R. Barlow
a650caa599 optimize: don't consider 1bpp images for PNG optimization 2019-09-03 23:47:20 -07:00
James R. Barlow
c6caff90a1 optimize: only re-insert pngs after pngquant
Previously we attempted to reinsert all PNGs, but it appears to be
unlikely that Leptonica's API is actually capable of optimizing the PNG
before it inserts it.

In any event qpdf has gained image optimization capabilities as well
which we coudld borrow.
2019-09-03 23:46:25 -07:00
James R. Barlow
671c88d3b5 optimize: exclude images with custom Decode tables 2019-09-03 23:37:23 -07:00
James R. Barlow
b2cfaedf91 optimize: Don't reinsert 1bpp images
There seems to be version to version inconsistencies between
Leptonica's photometric interpretation of 1bpp images, in
particular commit a0692307 introduces a change to force transcoding
in this situation.

However, I never entirely got to the bottom of where the problem
is, and in any event 1bpp images are probably better optimized
by JBIG2 than pngquant, so we're going to stop running them through
pngquant.
2019-09-03 23:26:13 -07:00
James R. Barlow
19ba3ae011 Allow test_german to xfail if deu language is not installed 2019-09-03 17:38:54 -07:00
James R. Barlow
feff1e38bb Use context managers to ensure Pillow images are closed 2019-09-03 17:19:12 -07:00
James R. Barlow
c8d6ea6b10 Fix tests broken by --print-parameters change 2019-09-03 17:17:24 -07:00
James R. Barlow
b0d9775343 Attempt to resolve black-inversion issue 2019-08-31 01:25:36 -07:00
James R. Barlow
462bfb84fb install: affirm that we now require Tesseract beta 2019-08-31 01:24:31 -07:00
James R. Barlow
11ef78a891 Fix running without eng.traineddata installed raises exception 2019-08-27 14:54:03 -07:00
James R. Barlow
638eb556ef Reactivate user-words test that was always skipped 2019-08-27 14:52:59 -07:00
James R. Barlow
fdefcd8af2 travis: Make 3.7 the build leader/deployer 2019-08-26 13:30:07 -07:00
James R. Barlow
09457edad3 alpine: use jbig2enc@community 2019-08-26 12:49:47 -07:00
James R. Barlow
6460a7eb3e docs: leptonica.com -> .org 2019-08-26 12:07:34 -07:00
James R. Barlow
707ebeb151 docs: installation updates 2019-08-11 18:48:56 -07:00
James R. Barlow
e9bc093842 v9.0.1 release notes v9.0.1 2019-08-11 17:14:11 -07:00
James R. Barlow
2eeaca1168 travis: make minimal config even more minimal 2019-08-11 17:13:55 -07:00
James R. Barlow
7755c5c5a7 tests: fix interpretation of None as omitted argument 2019-08-11 16:58:22 -07:00
James R. Barlow
793348a47c tests: mark test as requiring pngquant 2019-08-11 16:58:22 -07:00
James R. Barlow
b241f66919 travis: Add a minimal Ubuntu config 2019-08-11 16:58:06 -07:00
James R. Barlow
8ad034a678 docs: update install on FreeBSD to point to ports 2019-08-11 15:50:52 -07:00
James R. Barlow
a1a7b973e9 tests: split out stdin/stdout tests 2019-08-09 01:23:49 -07:00
James R. Barlow
7bfcd0a9d5 Use pikepdf 1.6.1 2019-08-09 01:12:13 -07:00
James R. Barlow
f276c4ef1e Alpine Docker: jbig2enc moved from testing to community 2019-08-09 01:09:18 -07:00
James R. Barlow
77bbc22c50 Ensure --image-dpi on non-image produces a warning 2019-08-09 01:08:16 -07:00
James R. Barlow
a6805ed343 Travis: remove vestiges of pdfminer being optional on osx 2019-07-30 00:42:38 -07:00
James R. Barlow
c4afc5c242 Add missing item from v9.0.0 release notes 2019-07-30 00:39:14 -07:00