James R. Barlow
3946bba318
Too soon, try again
2016-02-16 03:51:08 -08:00
James R. Barlow
2ed0b78a7b
Travis: are you creating _leptonica.py?
2016-02-16 03:47:35 -08:00
James R. Barlow
ed346d032c
Does Travis need explicit install libffi-dev?
2016-02-16 03:41:11 -08:00
James R. Barlow
acd645f192
Fix travis syntax error
2016-02-16 02:42:47 -08:00
James R. Barlow
88433e4c34
Fiddle with travis, try to get better debug output
...
Essentially cffi failed somehow, not clear how
2016-02-16 02:12:14 -08:00
James R. Barlow
1224af1780
Update test resources to address files with unknown source
...
-Remove Test_Issue_28.pdf (inherited from fritz-hh, source unknown)
-Replace missing_docinfo.pdf (received from user, but it's a printout of
a website; unclear status, so created a new PDF with the same effect)
-Others are okay
2016-02-16 00:28:28 -08:00
James R. Barlow
ab13342931
Revise rotation tests in prep for adding a few more
2016-02-15 17:17:43 -08:00
James R. Barlow
d7913da484
Test case: remove filename conflict
2016-02-15 16:49:28 -08:00
James R. Barlow
c50e3f1329
Complain about older tesseracts that don't have sharp2.ttf installed
2016-02-15 16:43:41 -08:00
James R. Barlow
a62f86dbd7
Update release notes
2016-02-15 16:43:14 -08:00
James R. Barlow
33b88b18db
Update the notes
2016-02-15 14:03:59 -08:00
James R. Barlow
7c691c21ab
Fix image layer rotation for pages with nonzero crop boxes
2016-02-10 17:48:33 -08:00
James R. Barlow
4ec51729d8
Partial fix for images not anchored to (0, 0)
2016-02-10 17:14:48 -08:00
James R. Barlow
07b41e479a
Cleaner access to mediabox
2016-02-09 02:19:05 -08:00
James R. Barlow
6510bcad19
DPI information not transferred automatically from PNG to JPEG
2016-02-09 02:18:54 -08:00
James R. Barlow
265d2ce39b
Better skewed image
2016-02-08 23:44:46 -08:00
James R. Barlow
1928a64cae
Better logging output for autorotation
2016-02-08 23:42:25 -08:00
James R. Barlow
11e575a5a3
leptonica: suppress debug output
2016-02-08 23:41:45 -08:00
James R. Barlow
7fbc0d6460
tesseract: unify logging function
2016-02-08 23:40:36 -08:00
James R. Barlow
1ba8b1aa4b
unpaper is lousy at deskewing, so let leptonica do it
2016-02-08 15:26:33 -08:00
James R. Barlow
3569c76c0f
Also include cardinal.pdf
2016-02-08 15:23:04 -08:00
James R. Barlow
16c7ac2582
Fix test_deskew for new Leptonica API
2016-02-08 15:20:01 -08:00
James R. Barlow
4ceb59215f
Leptonica: classes are better
2016-02-08 15:14:44 -08:00
James R. Barlow
2e6879ee51
Introduce Leptonica class for Pix
2016-02-08 14:52:01 -08:00
James R. Barlow
66fc2e9d7d
Add rotate 180 correlation sanity check
2016-02-08 13:10:11 -08:00
James R. Barlow
2c7a6e574f
Shorten names of _make_input/output
2016-02-08 12:57:26 -08:00
James R. Barlow
78c3bf5dba
Check autorotate using leptonica correlation
2016-02-08 12:55:50 -08:00
James R. Barlow
98c115e3bb
Cache wasn't enabled properly for test_autorotate
2016-02-08 12:55:28 -08:00
James R. Barlow
2752bda80b
Merge branch 'feature/leptdeskew' into feature/logging
...
Need leptonica for testing now, I think
# Conflicts:
# ocrmypdf/tesseract.py
# requirements.txt
# setup.py
2016-02-08 12:34:48 -08:00
James R. Barlow
7c0940609a
Take a stab at writing test case for autorotate
2016-02-08 12:32:39 -08:00
James R. Barlow
d30a879e2d
Fix test suite by running select_image_for_pdf unconditionally
...
The purpose of this change that caused the problem was a minor
optimization for the tesseract renderer path that had it pull an image
from select_image_for_pdf so that it could use a JPEG instead of PNG,
instead of taking it from preprocess_clean where it would only get a PNG
and make large files.
2016-02-08 02:33:03 -08:00
James R. Barlow
b907234d5c
Update tesseract spoofing to cache orientation and script detection checks
...
No cache: 269 s
With cache: 144 s
test_oversample[tesseract] now fails, all others good
2016-02-08 02:21:56 -08:00
James R. Barlow
b0114c9174
More logging improvements
2016-02-08 01:31:15 -08:00
James R. Barlow
d2ba8c501f
Restore invisibletext for normal output
2016-02-08 01:14:39 -08:00
James R. Barlow
6a7ed7d359
Make logging output a lot more useful
2016-02-08 00:58:14 -08:00
James R. Barlow
6289afa1a6
Better: custom logging factory to avoid whatever ruffus is doing
2016-02-08 00:18:52 -08:00
James R. Barlow
9bb6fa04cb
Return logging to a semblance of normalcy
2016-02-08 00:09:31 -08:00
James R. Barlow
afb6f6f5c9
Render preview as .jpg instead of .png
...
Smaller file size of JPEG seems to help performance, although the
difference is only about 1%.
2016-02-07 15:49:10 -08:00
James R. Barlow
8a69671dbd
Suppress debug message
2016-02-07 15:43:57 -08:00
James R. Barlow
178aee4687
Make rotation optional (for now it's off, possibly should be on)
2016-02-07 15:43:45 -08:00
James R. Barlow
8484caddfb
Tweak pipeline, allowing --pdf-renderer to use JPEGs instead of PNGs
2016-02-07 15:36:51 -08:00
James R. Barlow
08313316de
Cleanup auto-rotation
2016-02-07 15:06:54 -08:00
James R. Barlow
1d0eca5c63
All four rotation directions working
2016-02-07 06:09:01 -08:00
James R. Barlow
fe89232a30
Fix autorotate for some lossless cases
2016-02-07 05:59:46 -08:00
James R. Barlow
4b51b521e2
Implement autorotate (provided lossless reconstruction is disabled)
...
Works for a single page file, probably
Although arguably rotation is not quite lossless, and the two could be
mutually exclusive anyway, so maybe this is it. Did not check in some
debugging changes (lossless=False, text debugging=True)
PyPDF seems to get merging wrong when one of the pages is rotated.
2016-02-07 03:27:33 -08:00
James R. Barlow
e9ec458304
tesseract: add command to access OSD values
2016-02-07 03:21:32 -08:00
James R. Barlow
54b0ddd787
ghostscript: don't try to "help" autorotation
...
It uses text direction alone -- unreliable guide.
2016-02-07 03:20:42 -08:00
jbarlow83
93bec22f9c
README: mention polyglot, fix container vs image
2016-02-07 00:32:20 -08:00
James R. Barlow
0dc96442d8
Fix img2pdf usage in test case (to make Travis CI happy again)
2016-02-06 23:41:32 -08:00
James R. Barlow
58f4582517
More Dockerfile repair
...
I'm not fully happy with this arrangement, as it effectively downloads
OCRmyPDF twice, not to mention the lengthy setup time overall.
Will need to try separate build/run images in the future, but now just
get it working again.
2016-02-06 23:13:16 -08:00