James R. Barlow
cf4b04f92d
The main 'quick' test should be a file that OCRs to recognizable text
2016-10-07 16:25:34 -07:00
James R. Barlow
013c5a369f
Replace redacted file with an OCR-able file
2016-10-07 12:45:22 -07:00
James R. Barlow
6baf8668a6
Replace with non-free file milk.pdf with free equivalent
2016-10-06 13:10:28 -07:00
Sean Whitton
7f08f15fc9
pytest skipif for milk.pdf test ( #95 )
...
Skip the test if the fair use restricted milk.pdf is not present.
2016-09-15 08:55:31 -07:00
James R. Barlow
bd534c3313
main.py -> __main__.py
...
Executing a package with python -m packagename will check for
__main__.py inside the package. In other words main.py should have
always been named __main__.py.
In the unlikely event that someone depends on "import ocrmypdf.main"
being meaningful, main.py continues to exist and replicates the
behavior of __main__. (It's unlikely because import ocrmypdf.main does
unpythonic ruffus-related things at things import time, essentially
configuring itself to work with sys.argv. To fix another day.)
This should solve the problem of Debian needing to run test suites
before installation and afterwards for continuous integration without
having to patch either file, as python -m ocrmypdf will follow import
order. That is, if the current directory contains "ocrmypdf/" (e.g.
staging a new version) then that will be tested, else sys.path will
be checked.
2016-08-31 17:01:42 -07:00
James R. Barlow
bf89e38c69
Add milk.pdf test case
2016-08-31 11:42:21 -07:00
James R. Barlow
325cc0beca
Allow test cases to run without installing first
...
As @spwhitton found:
The test suite needs to call "python3 -m ocrmypdf.main" instead of
just "ocrmypdf" because this /usr/bin/ocrmypdf script has not yet been
generated when dh runs the test suite.
---
Seems reasonable to perform in-place testing independent of installation.
Source:
https://sources.debian.net/src/ocrmypdf/4.2.1%2Bgit.20160824.1.5d67cc7-1/debian/patches/0001-patch-test-suite-executable.patch/
2016-08-26 15:23:26 -07:00
James R. Barlow
1a9f09c4d5
Remove OCRmyPDF.sh and its usage in all test cases
2016-08-26 15:18:38 -07:00
James R. Barlow
4fed4e2af3
tests: don't try to pass Unicode arguments on command line on Linux
...
Depends on locale being configured properly, and it's not necessary
to be able to do this.
2016-08-26 15:08:56 -07:00
James R. Barlow
cc7e328358
Improve some documentation for tests
2016-08-26 15:04:08 -07:00
James R. Barlow
d25397e2b0
Add test case for PDFs with masks and stencil masks
2016-08-26 15:03:27 -07:00
James R. Barlow
2025a096c3
Test case for stdin streaming
2016-08-25 14:46:54 -07:00
James R. Barlow
e5541e435c
New test to confirm we can emit JBIG2 with appropriate settings
2016-08-03 11:35:48 -07:00
James R. Barlow
e70387b1af
Add a simple test for image to PDF
2016-08-03 03:35:30 -07:00
James R. Barlow
91d715ac93
Add test cases for --output-type
2016-08-03 02:47:18 -07:00
James R. Barlow
fef35e4eb2
Fix handling of DPI for rare case of JPEG recompression after deskew/clean
...
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00
James R. Barlow
8f77576dc4
Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1
...
Tesseract renderer not immediately fixable.
2016-07-28 16:43:51 -07:00
James R. Barlow
16e4d342d2
Bug fix: --force-ocr should still run on pages with no images
...
Useful for people who want to reprocess text.
This also requires --oversample because DPI is undefined. To be fixed
in next commit.
2016-07-27 15:06:49 -07:00
James R. Barlow
b4a734fc0d
Test case for "algorithm 4" test
...
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00
James R. Barlow
ff092c8629
Fix race condition between these tests when run in parallel
2016-04-28 00:39:15 -07:00
James R. Barlow
40baab32ac
Remove dead code "import stuff in testcase"
2016-04-14 14:22:34 -07:00
James R. Barlow
e877d37ac8
--rotate-pages: Only apply rotation if we're reasonable confident
...
Take the threshold from tesseract's default value for -psm 1.
2016-04-14 13:49:44 -07:00
James R. Barlow
322085933b
unpaper: fix check for missing and old versions, add test case
2016-03-10 15:37:09 -08:00
James R. Barlow
7c5e58a497
Fix test cases that break in Docker, improve test for running in Docker
2016-02-20 23:47:37 -08:00
James R. Barlow
cab381a339
Add JPEG 2000 test case
2016-02-20 05:13:19 -08:00
James R. Barlow
8246cc0538
Gracefully recover from tesseract's failure to process very large images
...
And test cases to check this
2016-02-20 04:53:23 -08:00
James R. Barlow
4206e74f42
tests: also check that monochrome correlation correctly detects matches
2016-02-19 14:35:31 -08:00
James R. Barlow
68c3ce56a9
Don't do chmod unless necessarily (breaks py.test on Docker)
2016-02-19 14:09:56 -08:00
James R. Barlow
ab0e5fa425
Improve error checking for tesseract -psm 0 (orientation) errors
2016-02-19 03:58:39 -08:00
James R. Barlow
f3b0434a87
Improve ability to capture error messages from tesseract on a crash
2016-02-19 03:48:49 -08:00
James R. Barlow
ef0aab060a
Make debug output more verbose on failure
2016-02-16 05:17:18 -08:00
James R. Barlow
88433e4c34
Fiddle with travis, try to get better debug output
...
Essentially cffi failed somehow, not clear how
2016-02-16 02:12:14 -08:00
James R. Barlow
ab13342931
Revise rotation tests in prep for adding a few more
2016-02-15 17:17:43 -08:00
James R. Barlow
d7913da484
Test case: remove filename conflict
2016-02-15 16:49:28 -08:00
James R. Barlow
6510bcad19
DPI information not transferred automatically from PNG to JPEG
2016-02-09 02:18:54 -08:00
James R. Barlow
16c7ac2582
Fix test_deskew for new Leptonica API
2016-02-08 15:20:01 -08:00
James R. Barlow
4ceb59215f
Leptonica: classes are better
2016-02-08 15:14:44 -08:00
James R. Barlow
2e6879ee51
Introduce Leptonica class for Pix
2016-02-08 14:52:01 -08:00
James R. Barlow
66fc2e9d7d
Add rotate 180 correlation sanity check
2016-02-08 13:10:11 -08:00
James R. Barlow
2c7a6e574f
Shorten names of _make_input/output
2016-02-08 12:57:26 -08:00
James R. Barlow
78c3bf5dba
Check autorotate using leptonica correlation
2016-02-08 12:55:50 -08:00
James R. Barlow
98c115e3bb
Cache wasn't enabled properly for test_autorotate
2016-02-08 12:55:28 -08:00
James R. Barlow
7c0940609a
Take a stab at writing test case for autorotate
2016-02-08 12:32:39 -08:00
James R. Barlow
9058dedfbe
New tests for ccitt, jbig2 encodings
2016-01-19 13:01:56 -08:00
James R. Barlow
354e61946e
Use os.makedirs for test output directories
...
Broke Travis
2016-01-16 02:47:56 -08:00
James R. Barlow
360acd1e2c
Adjust test_oversample test case
...
Add -f to force generation of the background image at the desired
oversample resolution. Our new behavior is to only send the oversampled
image to Tesseract while leaving the main page intact unless asked to
deskew, clean, etc.
2016-01-15 15:55:23 -08:00
James R. Barlow
7c558b3713
Move pageinfo test into tests folder
2016-01-11 17:40:44 -08:00
James R. Barlow
3b53e9adac
Use tesseract cache for -psm
2016-01-11 17:22:50 -08:00
James R. Barlow
074c1d71b4
Activate --tesseract-pagesegmode
2016-01-11 17:19:32 -08:00
James R. Barlow
09782242c8
Adjust test cases to use cache and noop more effectively
...
This reduces total execution time to 164s on my machine, down from
about double that.
2015-12-17 14:00:17 -08:00