James R. Barlow
131a5b741d
tesseract.py: update canned HOCR template to tess 3.05 output
...
Seems better to not claim the existence of several entities that don’t
exist as the older one does
2017-05-14 23:40:09 -07:00
James R. Barlow
65b89687a9
ghostscript: fix missing “import sys”, only applicable for an exception
2017-05-14 23:38:52 -07:00
James R. Barlow
048ae40e75
Update copyrights
2017-05-14 23:38:28 -07:00
James R. Barlow
234183ecd2
Fix: Tesseract 3.04 is sensitive to order of configuration commands
...
“txt hocr” is not acceptable and does not produce expected output .txt
while “hocr text” works fine, so switch the order everywhere.
Should fix #169
2017-05-14 23:27:46 -07:00
James R. Barlow
fb067dc97b
cookbook: more on improving OCR
2017-05-14 23:16:47 -07:00
James R. Barlow
a1fea0ce16
docs: link to OCRmyPDF-web
2017-05-14 23:16:30 -07:00
James R. Barlow
e1e9135e93
Test suite: tidy up imports
2017-05-14 23:15:29 -07:00
James R. Barlow
aff982036b
autobrew: fix brew audit error on double blank line
2017-05-14 23:15:02 -07:00
James R. Barlow
d087649eab
Remove “null deploy script” since “/usr/bin/true” is equivalent
2017-05-12 15:37:02 -07:00
James R. Barlow
7f3fa46a40
v5.0 release notes
v5.0
2017-05-12 14:14:28 -07:00
James R. Barlow
b1f79e4d97
Disable other use redo_ocr
2017-05-12 13:24:30 -07:00
James R. Barlow
115d6df94f
Warn user when —image-dpi is supplied but ignored
2017-05-12 12:09:53 -07:00
James R. Barlow
559af9635f
—redo-ocr is not implemented, so disable
2017-05-12 12:08:16 -07:00
James R. Barlow
cb06359c0b
Turn on Tesseract 4 cache in test suite
...
Travis is too slow without it, and perhaps it’s overly paranoid to
never cache Tess4. Maybe nuke the cache occasionally to be safe…
2017-05-12 11:42:27 -07:00
James R. Barlow
5e26bb29d9
Update requirements files
2017-05-12 11:41:15 -07:00
James R. Barlow
b0e95842b8
Fix Travis CI errors while looking around for Tess4
2017-05-12 00:40:00 -07:00
James R. Barlow
08e678f21f
rst: Clean up indentation
2017-05-12 00:12:06 -07:00
James R. Barlow
c17817810f
Update documentation for 3.03 support removal
2017-05-12 00:08:22 -07:00
James R. Barlow
ff5c38b1f7
Tell Travis to download Tesseract 4.00 from a PPA for testing
2017-05-11 23:52:13 -07:00
James R. Barlow
64314c1b82
Insist on Python 3.5 wherever we check for it
2017-05-11 23:51:45 -07:00
James R. Barlow
83230097ae
Insist on Tesseract 3.04 wherever we check for it
2017-05-11 23:51:28 -07:00
James R. Barlow
8f91acf956
Remove Tesseract 3.02 and 3.03 compatibility shims
2017-05-11 23:50:52 -07:00
James R. Barlow
d211722a2f
.gitignore the docs Makefile
2017-05-11 23:28:52 -07:00
James R. Barlow
56e6ed1249
Fix missing import; all tests passing!
2017-05-11 23:28:05 -07:00
James R. Barlow
21982cf1cb
baiona_gray remove alpha channel
2017-05-11 23:23:37 -07:00
James R. Barlow
edc01408da
Update the .png files, again, hopefully without corruption
2017-05-11 23:20:50 -07:00
James R. Barlow
aee33c87ed
Merge release notes
2017-05-11 23:11:12 -07:00
James R. Barlow
0dae1602c7
Fix missing import PIPE
2017-05-11 23:07:20 -07:00
James R. Barlow
d926f07ac1
Stop git from corrupting .pngs
...
Grrr.
2017-05-11 23:07:06 -07:00
James R. Barlow
96045e98f4
Update develop with master changes
...
We’re well out of the “trivial updates” zone
2017-05-11 22:54:27 -07:00
James R. Barlow
01b7205e2c
Ensure skipped pages are explained in sidecars
2017-05-11 00:43:36 -07:00
James R. Barlow
c8a4cbcf17
Fix test suite breakage after sidecar feature added
...
Forgot to update tesseract spoofers to account for change in tesseract
parameters. Also the change to outputting multiple files in the collate
steps affected how ruffus passes information into downstream consumers
of those files.
2017-05-11 00:17:24 -07:00
James R. Barlow
16b6442b23
Add changes to __main__.py that should have been in last commit
2017-05-10 17:55:42 -07:00
James R. Barlow
183eafa587
Implement sidecar text files ( #126 )
2017-05-10 15:22:44 -07:00
James R. Barlow
47a2997538
Reorganize —help text
2017-05-10 12:19:56 -07:00
James R. Barlow
37ebcadfa1
Implement —user-words, —user-patterns
2017-05-09 17:54:56 -07:00
James R. Barlow
74d98216f1
Update documentation for Ghostscript behavior
2017-05-09 17:43:39 -07:00
James R. Barlow
4bdebf573e
Tell Travis CI to use multiple cores
...
Let’s see if this helps the build go faster
2017-05-09 17:24:32 -07:00
James R. Barlow
1606b6a383
Add —quiet ( fixes #143 ), stop using ruffus to partially generate argparser
2017-05-09 17:24:06 -07:00
James R. Barlow
2a61902df5
Merge commit 'c4f01de231d22da5cea02c25aa581a965a37640b'
2017-05-09 16:37:55 -07:00
James R. Barlow
01a1c2b576
Implement —pdfa-image-compression to control Ghostscript’s compression
...
Fixes #163
2017-05-09 16:37:29 -07:00
Ingo Feinerer
c4f01de231
Fix typo "cutput" -> "output" ( #164 )
...
[ci skip]
2017-05-09 16:22:10 -07:00
James R. Barlow
63a4a761dd
Revert "v4.5.7 release notes"
...
The change introduced regressions, so find another way to fix.
This reverts commit d077c03686981c1601305cac2eb7b97e7f823a34.
[ci skip]
2017-05-08 14:38:04 -07:00
James R. Barlow
d077c03686
v4.5.7 release notes
2017-05-06 22:34:54 -07:00
James R. Barlow
c97ea1f2a9
Update high DPI test case to confirm the output image is not downsampled
2017-05-06 22:34:01 -07:00
James R. Barlow
fd27df2abb
Update documentation to warn that transparency is not tested
2017-05-06 22:33:24 -07:00
James R. Barlow
bf04f03c4c
Fix corrupt test file “typewriter.png”
...
This file is not currently used in any tests, but could be, so replace
corrupt version with a useful one.
2017-05-06 22:28:34 -07:00
James R. Barlow
93e802f473
Fix issue #163 , color and grayscale images JPEG compressed when not needed
2017-05-06 22:27:25 -07:00
James R. Barlow
1464b9087a
Try Travis again with null deploy for OSX
2017-05-01 17:37:59 -07:00
James R. Barlow
e8cc8fc879
Add travis null_deploy for osx
2017-05-01 17:24:43 -07:00