James R. Barlow
b4a734fc0d
Test case for "algorithm 4" test
...
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00
James R. Barlow
ff092c8629
Fix race condition between these tests when run in parallel
2016-04-28 00:39:15 -07:00
James R. Barlow
40baab32ac
Remove dead code "import stuff in testcase"
2016-04-14 14:22:34 -07:00
James R. Barlow
e877d37ac8
--rotate-pages: Only apply rotation if we're reasonable confident
...
Take the threshold from tesseract's default value for -psm 1.
2016-04-14 13:49:44 -07:00
James R. Barlow
322085933b
unpaper: fix check for missing and old versions, add test case
2016-03-10 15:37:09 -08:00
James R. Barlow
f3e06b2dbd
Add bookmarks to file for more testing
2016-02-29 00:05:07 -08:00
James R. Barlow
570bbe9a05
Add comments and remove debugging, improve inline handling
...
Squashed commits:
[bfff3c9] pageinfo, have a main()
2016-02-27 00:18:36 -08:00
James R. Barlow
5cc3adb39a
Add support for inline images
2016-02-27 00:18:36 -08:00
James R. Barlow
3957a0606c
Compute image pixel density without performing rectangle intersection (+5 squashed commits)
...
Squashed commits:
[0e27904] Partially implement DPI calculation with rotation of the image
Fixes test suite
[a64f662] pageinfo: all tests pass
[c5b811a] Fix typos
[cdd2286] Can now find inline images for efficiently
[60dde8d] First cut at implementing intelligent DPI detection based on content stream
Broke many of the test cases
2016-02-27 00:18:36 -08:00
James R. Barlow
7c5e58a497
Fix test cases that break in Docker, improve test for running in Docker
2016-02-20 23:47:37 -08:00
James R. Barlow
323b9a5f8e
Add other missing files
2016-02-20 05:34:21 -08:00
James R. Barlow
cab381a339
Add JPEG 2000 test case
2016-02-20 05:13:19 -08:00
James R. Barlow
8246cc0538
Gracefully recover from tesseract's failure to process very large images
...
And test cases to check this
2016-02-20 04:53:23 -08:00
James R. Barlow
ac71c3be63
4.0.2rc1 - release notes, add missing file caught by Travis
2016-02-20 03:36:37 -08:00
James R. Barlow
4206e74f42
tests: also check that monochrome correlation correctly detects matches
2016-02-19 14:35:31 -08:00
James R. Barlow
68c3ce56a9
Don't do chmod unless necessarily (breaks py.test on Docker)
2016-02-19 14:09:56 -08:00
James R. Barlow
ab0e5fa425
Improve error checking for tesseract -psm 0 (orientation) errors
2016-02-19 03:58:39 -08:00
James R. Barlow
f3b0434a87
Improve ability to capture error messages from tesseract on a crash
2016-02-19 03:48:49 -08:00
James R. Barlow
812fd745b6
Remove redundant line from resources
2016-02-16 14:29:56 -08:00
James R. Barlow
ef0aab060a
Make debug output more verbose on failure
2016-02-16 05:17:18 -08:00
James R. Barlow
88433e4c34
Fiddle with travis, try to get better debug output
...
Essentially cffi failed somehow, not clear how
2016-02-16 02:12:14 -08:00
James R. Barlow
1224af1780
Update test resources to address files with unknown source
...
-Remove Test_Issue_28.pdf (inherited from fritz-hh, source unknown)
-Replace missing_docinfo.pdf (received from user, but it's a printout of
a website; unclear status, so created a new PDF with the same effect)
-Others are okay
2016-02-16 00:28:28 -08:00
James R. Barlow
ab13342931
Revise rotation tests in prep for adding a few more
2016-02-15 17:17:43 -08:00
James R. Barlow
d7913da484
Test case: remove filename conflict
2016-02-15 16:49:28 -08:00
James R. Barlow
6510bcad19
DPI information not transferred automatically from PNG to JPEG
2016-02-09 02:18:54 -08:00
James R. Barlow
265d2ce39b
Better skewed image
2016-02-08 23:44:46 -08:00
James R. Barlow
3569c76c0f
Also include cardinal.pdf
2016-02-08 15:23:04 -08:00
James R. Barlow
16c7ac2582
Fix test_deskew for new Leptonica API
2016-02-08 15:20:01 -08:00
James R. Barlow
4ceb59215f
Leptonica: classes are better
2016-02-08 15:14:44 -08:00
James R. Barlow
2e6879ee51
Introduce Leptonica class for Pix
2016-02-08 14:52:01 -08:00
James R. Barlow
66fc2e9d7d
Add rotate 180 correlation sanity check
2016-02-08 13:10:11 -08:00
James R. Barlow
2c7a6e574f
Shorten names of _make_input/output
2016-02-08 12:57:26 -08:00
James R. Barlow
78c3bf5dba
Check autorotate using leptonica correlation
2016-02-08 12:55:50 -08:00
James R. Barlow
98c115e3bb
Cache wasn't enabled properly for test_autorotate
2016-02-08 12:55:28 -08:00
James R. Barlow
7c0940609a
Take a stab at writing test case for autorotate
2016-02-08 12:32:39 -08:00
James R. Barlow
b907234d5c
Update tesseract spoofing to cache orientation and script detection checks
...
No cache: 269 s
With cache: 144 s
test_oversample[tesseract] now fails, all others good
2016-02-08 02:21:56 -08:00
James R. Barlow
0dc96442d8
Fix img2pdf usage in test case (to make Travis CI happy again)
2016-02-06 23:41:32 -08:00
James R. Barlow
43b0faa830
Bug in tesseract_noop spoof: produced wrong page sizes
...
Now checks input image to ensure the implied page size of its .hocr file
matches the rest of the PDF.
2016-02-04 18:48:22 -08:00
James R. Barlow
9058dedfbe
New tests for ccitt, jbig2 encodings
2016-01-19 13:01:56 -08:00
James R. Barlow
354e61946e
Use os.makedirs for test output directories
...
Broke Travis
2016-01-16 02:47:56 -08:00
James R. Barlow
360acd1e2c
Adjust test_oversample test case
...
Add -f to force generation of the background image at the desired
oversample resolution. Our new behavior is to only send the oversampled
image to Tesseract while leaving the main page intact unless asked to
deskew, clean, etc.
2016-01-15 15:55:23 -08:00
James R. Barlow
c368c51bad
New hocrtransform test
2016-01-15 14:14:08 -08:00
James R. Barlow
7c558b3713
Move pageinfo test into tests folder
2016-01-11 17:40:44 -08:00
James R. Barlow
3b53e9adac
Use tesseract cache for -psm
2016-01-11 17:22:50 -08:00
James R. Barlow
074c1d71b4
Activate --tesseract-pagesegmode
2016-01-11 17:19:32 -08:00
James R. Barlow
09782242c8
Adjust test cases to use cache and noop more effectively
...
This reduces total execution time to 164s on my machine, down from
about double that.
2015-12-17 14:00:17 -08:00
James R. Barlow
9ec4aa039d
Add tesseract caching to speed up tests
2015-12-17 12:52:12 -08:00
James R. Barlow
ecebe2f24b
Let some tests use the spoofed tesseract
...
Where getting OCR doesn't matter
2015-12-17 11:56:09 -08:00
James R. Barlow
7313a77c2a
Implement pdf renderer side of tess spoof
2015-12-17 11:41:54 -08:00
James R. Barlow
45113676a3
Add Tesseract spoofing
2015-12-17 11:36:47 -08:00