8 Commits

Author SHA1 Message Date
James R. Barlow
8c17c9918e Add documentation and test cases for —tesseract-config
This parameter has existed for along time but never really got any
attention.
2017-01-28 22:06:51 -08:00
James R. Barlow
a4f07756a5 tesseract caching: don't transcode tesseract's output, hash source file
For sanity's sake, deal with tesseract streams in binary without
transcoding (via universal_newlines, etc.). The only differences are
printing messages regarding spoofing.

Also hash the source file so that changes to the cache mechanism
invalidate old cache automatically. That is probably too aggressive,
but simple and safer than the previous approach.
2016-10-28 16:44:12 -07:00
James R. Barlow
cc7e328358 Improve some documentation for tests 2016-08-26 15:04:08 -07:00
James R. Barlow
8246cc0538 Gracefully recover from tesseract's failure to process very large images
And test cases to check this
2016-02-20 04:53:23 -08:00
James R. Barlow
b907234d5c Update tesseract spoofing to cache orientation and script detection checks
No cache: 269 s
With cache: 144 s

test_oversample[tesseract] now fails, all others good
2016-02-08 02:21:56 -08:00
James R. Barlow
3b53e9adac Use tesseract cache for -psm 2016-01-11 17:22:50 -08:00
James R. Barlow
09782242c8 Adjust test cases to use cache and noop more effectively
This reduces total execution time to 164s on my machine, down from
about double that.
2015-12-17 14:00:17 -08:00
James R. Barlow
9ec4aa039d Add tesseract caching to speed up tests 2015-12-17 12:52:12 -08:00