2676 Commits

Author SHA1 Message Date
James R. Barlow
3946bba318 Too soon, try again 2016-02-16 03:51:08 -08:00
James R. Barlow
2ed0b78a7b Travis: are you creating _leptonica.py? 2016-02-16 03:47:35 -08:00
James R. Barlow
ed346d032c Does Travis need explicit install libffi-dev? 2016-02-16 03:41:11 -08:00
James R. Barlow
acd645f192 Fix travis syntax error 2016-02-16 02:42:47 -08:00
James R. Barlow
88433e4c34 Fiddle with travis, try to get better debug output
Essentially cffi failed somehow, not clear how
2016-02-16 02:12:14 -08:00
James R. Barlow
1224af1780 Update test resources to address files with unknown source
-Remove Test_Issue_28.pdf (inherited from fritz-hh, source unknown)
-Replace missing_docinfo.pdf (received from user, but it's a printout of
a website; unclear status, so created a new PDF with the same effect)
-Others are okay
2016-02-16 00:28:28 -08:00
James R. Barlow
ab13342931 Revise rotation tests in prep for adding a few more 2016-02-15 17:17:43 -08:00
James R. Barlow
d7913da484 Test case: remove filename conflict 2016-02-15 16:49:28 -08:00
James R. Barlow
c50e3f1329 Complain about older tesseracts that don't have sharp2.ttf installed 2016-02-15 16:43:41 -08:00
James R. Barlow
a62f86dbd7 Update release notes 2016-02-15 16:43:14 -08:00
James R. Barlow
33b88b18db Update the notes 2016-02-15 14:03:59 -08:00
James R. Barlow
7c691c21ab Fix image layer rotation for pages with nonzero crop boxes 2016-02-10 17:48:33 -08:00
James R. Barlow
4ec51729d8 Partial fix for images not anchored to (0, 0) 2016-02-10 17:14:48 -08:00
James R. Barlow
07b41e479a Cleaner access to mediabox 2016-02-09 02:19:05 -08:00
James R. Barlow
6510bcad19 DPI information not transferred automatically from PNG to JPEG 2016-02-09 02:18:54 -08:00
James R. Barlow
265d2ce39b Better skewed image 2016-02-08 23:44:46 -08:00
James R. Barlow
1928a64cae Better logging output for autorotation 2016-02-08 23:42:25 -08:00
James R. Barlow
11e575a5a3 leptonica: suppress debug output 2016-02-08 23:41:45 -08:00
James R. Barlow
7fbc0d6460 tesseract: unify logging function 2016-02-08 23:40:36 -08:00
James R. Barlow
1ba8b1aa4b unpaper is lousy at deskewing, so let leptonica do it 2016-02-08 15:26:33 -08:00
James R. Barlow
3569c76c0f Also include cardinal.pdf 2016-02-08 15:23:04 -08:00
James R. Barlow
16c7ac2582 Fix test_deskew for new Leptonica API 2016-02-08 15:20:01 -08:00
James R. Barlow
4ceb59215f Leptonica: classes are better 2016-02-08 15:14:44 -08:00
James R. Barlow
2e6879ee51 Introduce Leptonica class for Pix 2016-02-08 14:52:01 -08:00
James R. Barlow
66fc2e9d7d Add rotate 180 correlation sanity check 2016-02-08 13:10:11 -08:00
James R. Barlow
2c7a6e574f Shorten names of _make_input/output 2016-02-08 12:57:26 -08:00
James R. Barlow
78c3bf5dba Check autorotate using leptonica correlation 2016-02-08 12:55:50 -08:00
James R. Barlow
98c115e3bb Cache wasn't enabled properly for test_autorotate 2016-02-08 12:55:28 -08:00
James R. Barlow
2752bda80b Merge branch 'feature/leptdeskew' into feature/logging
Need leptonica for testing now, I think
# Conflicts:
#	ocrmypdf/tesseract.py
#	requirements.txt
#	setup.py
2016-02-08 12:34:48 -08:00
James R. Barlow
7c0940609a Take a stab at writing test case for autorotate 2016-02-08 12:32:39 -08:00
James R. Barlow
d30a879e2d Fix test suite by running select_image_for_pdf unconditionally
The purpose of this change that caused the problem was a minor
optimization for the tesseract renderer path that had it pull an image
from select_image_for_pdf so that it could use a JPEG instead of PNG,
instead of taking it from preprocess_clean where it would only get a PNG
and make large files.
2016-02-08 02:33:03 -08:00
James R. Barlow
b907234d5c Update tesseract spoofing to cache orientation and script detection checks
No cache: 269 s
With cache: 144 s

test_oversample[tesseract] now fails, all others good
2016-02-08 02:21:56 -08:00
James R. Barlow
b0114c9174 More logging improvements 2016-02-08 01:31:15 -08:00
James R. Barlow
d2ba8c501f Restore invisibletext for normal output 2016-02-08 01:14:39 -08:00
James R. Barlow
6a7ed7d359 Make logging output a lot more useful 2016-02-08 00:58:14 -08:00
James R. Barlow
6289afa1a6 Better: custom logging factory to avoid whatever ruffus is doing 2016-02-08 00:18:52 -08:00
James R. Barlow
9bb6fa04cb Return logging to a semblance of normalcy 2016-02-08 00:09:31 -08:00
James R. Barlow
afb6f6f5c9 Render preview as .jpg instead of .png
Smaller file size of JPEG seems to help performance, although the
difference is only about 1%.
2016-02-07 15:49:10 -08:00
James R. Barlow
8a69671dbd Suppress debug message 2016-02-07 15:43:57 -08:00
James R. Barlow
178aee4687 Make rotation optional (for now it's off, possibly should be on) 2016-02-07 15:43:45 -08:00
James R. Barlow
8484caddfb Tweak pipeline, allowing --pdf-renderer to use JPEGs instead of PNGs 2016-02-07 15:36:51 -08:00
James R. Barlow
08313316de Cleanup auto-rotation 2016-02-07 15:06:54 -08:00
James R. Barlow
1d0eca5c63 All four rotation directions working 2016-02-07 06:09:01 -08:00
James R. Barlow
fe89232a30 Fix autorotate for some lossless cases 2016-02-07 05:59:46 -08:00
James R. Barlow
4b51b521e2 Implement autorotate (provided lossless reconstruction is disabled)
Works for a single page file, probably

Although arguably rotation is not quite lossless, and the two could be
mutually exclusive anyway, so maybe this is it. Did not check in some
debugging changes (lossless=False, text debugging=True)

PyPDF seems to get merging wrong when one of the pages is rotated.
2016-02-07 03:27:33 -08:00
James R. Barlow
e9ec458304 tesseract: add command to access OSD values 2016-02-07 03:21:32 -08:00
James R. Barlow
54b0ddd787 ghostscript: don't try to "help" autorotation
It uses text direction alone -- unreliable guide.
2016-02-07 03:20:42 -08:00
jbarlow83
93bec22f9c README: mention polyglot, fix container vs image 2016-02-07 00:32:20 -08:00
James R. Barlow
0dc96442d8 Fix img2pdf usage in test case (to make Travis CI happy again) 2016-02-06 23:41:32 -08:00
James R. Barlow
58f4582517 More Dockerfile repair
I'm not fully happy with this arrangement, as it effectively downloads
OCRmyPDF twice, not to mention the lengthy setup time overall.

Will need to try separate build/run images in the future, but now just
get it working again.
2016-02-06 23:13:16 -08:00