2676 Commits

Author SHA1 Message Date
James R. Barlow
131a5b741d tesseract.py: update canned HOCR template to tess 3.05 output
Seems better to not claim the existence of several entities that don’t
exist as the older one does
2017-05-14 23:40:09 -07:00
James R. Barlow
65b89687a9 ghostscript: fix missing “import sys”, only applicable for an exception 2017-05-14 23:38:52 -07:00
James R. Barlow
048ae40e75 Update copyrights 2017-05-14 23:38:28 -07:00
James R. Barlow
234183ecd2 Fix: Tesseract 3.04 is sensitive to order of configuration commands
“txt hocr” is not acceptable and does not produce expected output .txt
while “hocr text” works fine, so switch the order everywhere.

Should fix #169
2017-05-14 23:27:46 -07:00
James R. Barlow
fb067dc97b cookbook: more on improving OCR 2017-05-14 23:16:47 -07:00
James R. Barlow
a1fea0ce16 docs: link to OCRmyPDF-web 2017-05-14 23:16:30 -07:00
James R. Barlow
e1e9135e93 Test suite: tidy up imports 2017-05-14 23:15:29 -07:00
James R. Barlow
aff982036b autobrew: fix brew audit error on double blank line 2017-05-14 23:15:02 -07:00
James R. Barlow
d087649eab Remove “null deploy script” since “/usr/bin/true” is equivalent 2017-05-12 15:37:02 -07:00
James R. Barlow
7f3fa46a40 v5.0 release notes v5.0 2017-05-12 14:14:28 -07:00
James R. Barlow
b1f79e4d97 Disable other use redo_ocr 2017-05-12 13:24:30 -07:00
James R. Barlow
115d6df94f Warn user when —image-dpi is supplied but ignored 2017-05-12 12:09:53 -07:00
James R. Barlow
559af9635f —redo-ocr is not implemented, so disable 2017-05-12 12:08:16 -07:00
James R. Barlow
cb06359c0b Turn on Tesseract 4 cache in test suite
Travis is too slow without it, and perhaps it’s overly paranoid to
never cache Tess4. Maybe nuke the cache occasionally to be safe…
2017-05-12 11:42:27 -07:00
James R. Barlow
5e26bb29d9 Update requirements files 2017-05-12 11:41:15 -07:00
James R. Barlow
b0e95842b8 Fix Travis CI errors while looking around for Tess4 2017-05-12 00:40:00 -07:00
James R. Barlow
08e678f21f rst: Clean up indentation 2017-05-12 00:12:06 -07:00
James R. Barlow
c17817810f Update documentation for 3.03 support removal 2017-05-12 00:08:22 -07:00
James R. Barlow
ff5c38b1f7 Tell Travis to download Tesseract 4.00 from a PPA for testing 2017-05-11 23:52:13 -07:00
James R. Barlow
64314c1b82 Insist on Python 3.5 wherever we check for it 2017-05-11 23:51:45 -07:00
James R. Barlow
83230097ae Insist on Tesseract 3.04 wherever we check for it 2017-05-11 23:51:28 -07:00
James R. Barlow
8f91acf956 Remove Tesseract 3.02 and 3.03 compatibility shims 2017-05-11 23:50:52 -07:00
James R. Barlow
d211722a2f .gitignore the docs Makefile 2017-05-11 23:28:52 -07:00
James R. Barlow
56e6ed1249 Fix missing import; all tests passing! 2017-05-11 23:28:05 -07:00
James R. Barlow
21982cf1cb baiona_gray remove alpha channel 2017-05-11 23:23:37 -07:00
James R. Barlow
edc01408da Update the .png files, again, hopefully without corruption 2017-05-11 23:20:50 -07:00
James R. Barlow
aee33c87ed Merge release notes 2017-05-11 23:11:12 -07:00
James R. Barlow
0dae1602c7 Fix missing import PIPE 2017-05-11 23:07:20 -07:00
James R. Barlow
d926f07ac1 Stop git from corrupting .pngs
Grrr.
2017-05-11 23:07:06 -07:00
James R. Barlow
96045e98f4 Update develop with master changes
We’re well out of the “trivial updates” zone
2017-05-11 22:54:27 -07:00
James R. Barlow
01b7205e2c Ensure skipped pages are explained in sidecars 2017-05-11 00:43:36 -07:00
James R. Barlow
c8a4cbcf17 Fix test suite breakage after sidecar feature added
Forgot to update tesseract spoofers to account for change in tesseract
parameters.  Also the change to outputting multiple files in the collate
steps affected how ruffus passes information into downstream consumers
of those files.
2017-05-11 00:17:24 -07:00
James R. Barlow
16b6442b23 Add changes to __main__.py that should have been in last commit 2017-05-10 17:55:42 -07:00
James R. Barlow
183eafa587 Implement sidecar text files (#126) 2017-05-10 15:22:44 -07:00
James R. Barlow
47a2997538 Reorganize —help text 2017-05-10 12:19:56 -07:00
James R. Barlow
37ebcadfa1 Implement —user-words, —user-patterns 2017-05-09 17:54:56 -07:00
James R. Barlow
74d98216f1 Update documentation for Ghostscript behavior 2017-05-09 17:43:39 -07:00
James R. Barlow
4bdebf573e Tell Travis CI to use multiple cores
Let’s see if this helps the build go faster
2017-05-09 17:24:32 -07:00
James R. Barlow
1606b6a383 Add —quiet (fixes #143), stop using ruffus to partially generate argparser 2017-05-09 17:24:06 -07:00
James R. Barlow
2a61902df5 Merge commit 'c4f01de231d22da5cea02c25aa581a965a37640b' 2017-05-09 16:37:55 -07:00
James R. Barlow
01a1c2b576 Implement —pdfa-image-compression to control Ghostscript’s compression
Fixes #163
2017-05-09 16:37:29 -07:00
Ingo Feinerer
c4f01de231 Fix typo "cutput" -> "output" (#164)
[ci skip]
2017-05-09 16:22:10 -07:00
James R. Barlow
63a4a761dd Revert "v4.5.7 release notes"
The change introduced regressions, so find another way to fix.

This reverts commit d077c03686981c1601305cac2eb7b97e7f823a34.

[ci skip]
2017-05-08 14:38:04 -07:00
James R. Barlow
d077c03686 v4.5.7 release notes 2017-05-06 22:34:54 -07:00
James R. Barlow
c97ea1f2a9 Update high DPI test case to confirm the output image is not downsampled 2017-05-06 22:34:01 -07:00
James R. Barlow
fd27df2abb Update documentation to warn that transparency is not tested 2017-05-06 22:33:24 -07:00
James R. Barlow
bf04f03c4c Fix corrupt test file “typewriter.png”
This file is not currently used in any tests, but could be, so replace
corrupt version with a useful one.
2017-05-06 22:28:34 -07:00
James R. Barlow
93e802f473 Fix issue #163, color and grayscale images JPEG compressed when not needed 2017-05-06 22:27:25 -07:00
James R. Barlow
1464b9087a Try Travis again with null deploy for OSX 2017-05-01 17:37:59 -07:00
James R. Barlow
e8cc8fc879 Add travis null_deploy for osx 2017-05-01 17:24:43 -07:00