291 Commits

Author SHA1 Message Date
James R. Barlow
b4a734fc0d Test case for "algorithm 4" test
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00
James R. Barlow
ff092c8629 Fix race condition between these tests when run in parallel 2016-04-28 00:39:15 -07:00
James R. Barlow
40baab32ac Remove dead code "import stuff in testcase" 2016-04-14 14:22:34 -07:00
James R. Barlow
e877d37ac8 --rotate-pages: Only apply rotation if we're reasonable confident
Take the threshold from tesseract's default value for -psm 1.
2016-04-14 13:49:44 -07:00
James R. Barlow
322085933b unpaper: fix check for missing and old versions, add test case 2016-03-10 15:37:09 -08:00
James R. Barlow
f3e06b2dbd Add bookmarks to file for more testing 2016-02-29 00:05:07 -08:00
James R. Barlow
570bbe9a05 Add comments and remove debugging, improve inline handling
Squashed commits:
[bfff3c9] pageinfo, have a main()
2016-02-27 00:18:36 -08:00
James R. Barlow
5cc3adb39a Add support for inline images 2016-02-27 00:18:36 -08:00
James R. Barlow
3957a0606c Compute image pixel density without performing rectangle intersection (+5 squashed commits)
Squashed commits:
[0e27904] Partially implement DPI calculation with rotation of the image

Fixes test suite
[a64f662] pageinfo: all tests pass
[c5b811a] Fix typos
[cdd2286] Can now find inline images for efficiently
[60dde8d] First cut at implementing intelligent DPI detection based on content stream

Broke many of the test cases
2016-02-27 00:18:36 -08:00
James R. Barlow
7c5e58a497 Fix test cases that break in Docker, improve test for running in Docker 2016-02-20 23:47:37 -08:00
James R. Barlow
323b9a5f8e Add other missing files 2016-02-20 05:34:21 -08:00
James R. Barlow
cab381a339 Add JPEG 2000 test case 2016-02-20 05:13:19 -08:00
James R. Barlow
8246cc0538 Gracefully recover from tesseract's failure to process very large images
And test cases to check this
2016-02-20 04:53:23 -08:00
James R. Barlow
ac71c3be63 4.0.2rc1 - release notes, add missing file caught by Travis 2016-02-20 03:36:37 -08:00
James R. Barlow
4206e74f42 tests: also check that monochrome correlation correctly detects matches 2016-02-19 14:35:31 -08:00
James R. Barlow
68c3ce56a9 Don't do chmod unless necessarily (breaks py.test on Docker) 2016-02-19 14:09:56 -08:00
James R. Barlow
ab0e5fa425 Improve error checking for tesseract -psm 0 (orientation) errors 2016-02-19 03:58:39 -08:00
James R. Barlow
f3b0434a87 Improve ability to capture error messages from tesseract on a crash 2016-02-19 03:48:49 -08:00
James R. Barlow
812fd745b6 Remove redundant line from resources 2016-02-16 14:29:56 -08:00
James R. Barlow
ef0aab060a Make debug output more verbose on failure 2016-02-16 05:17:18 -08:00
James R. Barlow
88433e4c34 Fiddle with travis, try to get better debug output
Essentially cffi failed somehow, not clear how
2016-02-16 02:12:14 -08:00
James R. Barlow
1224af1780 Update test resources to address files with unknown source
-Remove Test_Issue_28.pdf (inherited from fritz-hh, source unknown)
-Replace missing_docinfo.pdf (received from user, but it's a printout of
a website; unclear status, so created a new PDF with the same effect)
-Others are okay
2016-02-16 00:28:28 -08:00
James R. Barlow
ab13342931 Revise rotation tests in prep for adding a few more 2016-02-15 17:17:43 -08:00
James R. Barlow
d7913da484 Test case: remove filename conflict 2016-02-15 16:49:28 -08:00
James R. Barlow
6510bcad19 DPI information not transferred automatically from PNG to JPEG 2016-02-09 02:18:54 -08:00
James R. Barlow
265d2ce39b Better skewed image 2016-02-08 23:44:46 -08:00
James R. Barlow
3569c76c0f Also include cardinal.pdf 2016-02-08 15:23:04 -08:00
James R. Barlow
16c7ac2582 Fix test_deskew for new Leptonica API 2016-02-08 15:20:01 -08:00
James R. Barlow
4ceb59215f Leptonica: classes are better 2016-02-08 15:14:44 -08:00
James R. Barlow
2e6879ee51 Introduce Leptonica class for Pix 2016-02-08 14:52:01 -08:00
James R. Barlow
66fc2e9d7d Add rotate 180 correlation sanity check 2016-02-08 13:10:11 -08:00
James R. Barlow
2c7a6e574f Shorten names of _make_input/output 2016-02-08 12:57:26 -08:00
James R. Barlow
78c3bf5dba Check autorotate using leptonica correlation 2016-02-08 12:55:50 -08:00
James R. Barlow
98c115e3bb Cache wasn't enabled properly for test_autorotate 2016-02-08 12:55:28 -08:00
James R. Barlow
7c0940609a Take a stab at writing test case for autorotate 2016-02-08 12:32:39 -08:00
James R. Barlow
b907234d5c Update tesseract spoofing to cache orientation and script detection checks
No cache: 269 s
With cache: 144 s

test_oversample[tesseract] now fails, all others good
2016-02-08 02:21:56 -08:00
James R. Barlow
0dc96442d8 Fix img2pdf usage in test case (to make Travis CI happy again) 2016-02-06 23:41:32 -08:00
James R. Barlow
43b0faa830 Bug in tesseract_noop spoof: produced wrong page sizes
Now checks input image to ensure the implied page size of its .hocr file
matches the rest of the PDF.
2016-02-04 18:48:22 -08:00
James R. Barlow
9058dedfbe New tests for ccitt, jbig2 encodings 2016-01-19 13:01:56 -08:00
James R. Barlow
354e61946e Use os.makedirs for test output directories
Broke Travis
2016-01-16 02:47:56 -08:00
James R. Barlow
360acd1e2c Adjust test_oversample test case
Add -f to force generation of the background image at the desired
oversample resolution.  Our new behavior is to only send the oversampled
image to Tesseract while leaving the main page intact unless asked to
deskew, clean, etc.
2016-01-15 15:55:23 -08:00
James R. Barlow
c368c51bad New hocrtransform test 2016-01-15 14:14:08 -08:00
James R. Barlow
7c558b3713 Move pageinfo test into tests folder 2016-01-11 17:40:44 -08:00
James R. Barlow
3b53e9adac Use tesseract cache for -psm 2016-01-11 17:22:50 -08:00
James R. Barlow
074c1d71b4 Activate --tesseract-pagesegmode 2016-01-11 17:19:32 -08:00
James R. Barlow
09782242c8 Adjust test cases to use cache and noop more effectively
This reduces total execution time to 164s on my machine, down from
about double that.
2015-12-17 14:00:17 -08:00
James R. Barlow
9ec4aa039d Add tesseract caching to speed up tests 2015-12-17 12:52:12 -08:00
James R. Barlow
ecebe2f24b Let some tests use the spoofed tesseract
Where getting OCR doesn't matter
2015-12-17 11:56:09 -08:00
James R. Barlow
7313a77c2a Implement pdf renderer side of tess spoof 2015-12-17 11:41:54 -08:00
James R. Barlow
45113676a3 Add Tesseract spoofing 2015-12-17 11:36:47 -08:00