James R. Barlow
e71e8ca3ad
Workaround for GS VMerror -25 bug
...
Avoid inserting docinfo keys that would be translated to null strings,
to avoid running afoul of
https://bugs.ghostscript.com/show_bug.cgi?id=697684
2017-03-28 11:05:43 -07:00
James R. Barlow
199de96cff
Ghostcript 9.21 seems to have a regression related to Unicode metadata
2017-03-24 15:15:46 -07:00
James R. Barlow
8ddbe81513
Fix issue #147 : unpaper loses DPI information, affects —pdf-renderer tess4
2017-03-24 13:23:03 -07:00
James R. Barlow
f035cb1088
Fixed issue #142 — closed streams raise an exception on fork attempt
2017-03-13 15:52:57 -07:00
James R. Barlow
72660d0dec
MacOS skip the one test that needs poppler, to save installing poppler
2017-03-11 17:03:26 -08:00
James R. Barlow
4a1fec8328
Improvements to macOS test and work on homebrew tap autobrew
...
Squashed commits:
[3f06c1e] Try setting up homebrew tap autobuilding
[01532f1] Strict mode error in brew
2017-03-11 17:00:54 -08:00
James R. Barlow
7cd2770a13
Fix issue #137 - proportions of non-square resolution distorted
...
Distortion mainly affected —force-ocr
2017-02-26 17:13:16 -08:00
James R. Barlow
d1a0065ef8
Create test case for Form XObjects
2017-02-14 12:51:15 -08:00
James R. Barlow
005216bc57
Support ocrmypdf-tess4
2017-01-29 18:26:52 -08:00
James R. Barlow
8c17c9918e
Add documentation and test cases for —tesseract-config
...
This parameter has existed for along time but never really got any
attention.
2017-01-28 22:06:51 -08:00
James R. Barlow
9a15a4db10
Ensure specified destination is writable before starting pipeline process
2017-01-26 22:08:24 -08:00
James R. Barlow
55aeaec293
Autorotation check: Replace duplicated tests with parameterized test
2017-01-26 18:07:59 -08:00
James R. Barlow
f6df1fb40c
Fix test suite regression: output files dumped in tests/resources
2017-01-26 18:07:09 -08:00
James R. Barlow
b889a89c36
Fix remaining 3.4/3.5 regressions
2017-01-26 17:53:27 -08:00
James R. Barlow
1976dc6f30
Fix issue #121 “pop from empty list” (content stream parsing error)
2017-01-26 17:24:40 -08:00
James R. Barlow
02fba02d31
Refactor test suite to use fixtures to manage paths
2017-01-26 16:38:59 -08:00
James R. Barlow
fb9e7c82f6
Move duplicate test code into common namespace
2017-01-26 13:36:52 -08:00
James R. Barlow
b8767e5ba9
Rename exe -> exec, more Unix-y and suggestive
2016-12-10 15:34:00 -08:00
James R. Barlow
d33a50660d
Replace most sys.exit() with raising exceptions
...
Because ruffus doesn’t handle exceptions well I tended to call sys.exit
to make sure we got out of dodge when needed. However, sys.exit is not
ideal for the Python API this is moving towards, so this introduces
proper exceptions for the various cases that retain suggested error
codes. Only __main__.py should call sys.exit now, everyone else has to
throw an exception.
For now the worker raising a fatal exception is logging messages rather
than passing an exception object with the fatal error message, mainly
because ruffus doesn’t properly marshall the exception object so we
just check “what is the name of the exception class that caused ruffus
to thrown an RethrownJobError”?
Also fixed along the way was the wrong return code being shown for
encrypted PDF checking, and incorrect use of str.find (e.output.find)
in boolean logic (str.find returns -1 on failure to find, which is True).
2016-12-10 15:24:24 -08:00
James R. Barlow
4ee9658e97
Move external program wrappers to ocrmypdf.exe package
2016-12-09 16:54:24 -08:00
James R. Barlow
adc1580742
Help py.test collect output in more cases
2016-12-08 16:21:07 -08:00
James R. Barlow
e57aa0eee2
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”
...
And add new test case for this.
2016-12-08 16:06:53 -08:00
James R. Barlow
731e6792c7
Add test cases for Ghostscript PDF/A warnings
2016-12-03 00:32:09 -08:00
James R. Barlow
bb91393b85
Fix “deskew-rotate” bug.
...
Turns out this occurred in any case where pdf-renderer hocr was used
and a tesseract timeout or error occurred. We created a replacement
page based on the unrotated page dimensions instead of the input image’s
dimensions.
2016-11-07 14:17:31 -08:00
James R. Barlow
cc9c0d819e
Add test case for documents that get rotated incorrectly after deskew
2016-11-07 14:15:03 -08:00
James R. Barlow
2e4431cc63
Allow piping output to stdout
2016-10-27 16:14:42 -07:00
James R. Barlow
f7387b0859
test_stdin: simplify this test
...
No need to involve 'cat', just hook the file up to stdin.
2016-10-27 16:01:07 -07:00
James R. Barlow
a09f6b8977
Test cases: check that stdout is clear of output
...
To ensure piping to stdout is possible.
2016-10-27 15:58:24 -07:00
James R. Barlow
7eca8508fd
Implement new preprocessing feature, background removal
2016-10-14 17:23:34 -07:00
James R. Barlow
cf4b04f92d
The main 'quick' test should be a file that OCRs to recognizable text
2016-10-07 16:25:34 -07:00
James R. Barlow
013c5a369f
Replace redacted file with an OCR-able file
2016-10-07 12:45:22 -07:00
James R. Barlow
6baf8668a6
Replace with non-free file milk.pdf with free equivalent
2016-10-06 13:10:28 -07:00
Sean Whitton
7f08f15fc9
pytest skipif for milk.pdf test ( #95 )
...
Skip the test if the fair use restricted milk.pdf is not present.
2016-09-15 08:55:31 -07:00
James R. Barlow
bd534c3313
main.py -> __main__.py
...
Executing a package with python -m packagename will check for
__main__.py inside the package. In other words main.py should have
always been named __main__.py.
In the unlikely event that someone depends on "import ocrmypdf.main"
being meaningful, main.py continues to exist and replicates the
behavior of __main__. (It's unlikely because import ocrmypdf.main does
unpythonic ruffus-related things at things import time, essentially
configuring itself to work with sys.argv. To fix another day.)
This should solve the problem of Debian needing to run test suites
before installation and afterwards for continuous integration without
having to patch either file, as python -m ocrmypdf will follow import
order. That is, if the current directory contains "ocrmypdf/" (e.g.
staging a new version) then that will be tested, else sys.path will
be checked.
2016-08-31 17:01:42 -07:00
James R. Barlow
bf89e38c69
Add milk.pdf test case
2016-08-31 11:42:21 -07:00
James R. Barlow
325cc0beca
Allow test cases to run without installing first
...
As @spwhitton found:
The test suite needs to call "python3 -m ocrmypdf.main" instead of
just "ocrmypdf" because this /usr/bin/ocrmypdf script has not yet been
generated when dh runs the test suite.
---
Seems reasonable to perform in-place testing independent of installation.
Source:
https://sources.debian.net/src/ocrmypdf/4.2.1%2Bgit.20160824.1.5d67cc7-1/debian/patches/0001-patch-test-suite-executable.patch/
2016-08-26 15:23:26 -07:00
James R. Barlow
1a9f09c4d5
Remove OCRmyPDF.sh and its usage in all test cases
2016-08-26 15:18:38 -07:00
James R. Barlow
4fed4e2af3
tests: don't try to pass Unicode arguments on command line on Linux
...
Depends on locale being configured properly, and it's not necessary
to be able to do this.
2016-08-26 15:08:56 -07:00
James R. Barlow
cc7e328358
Improve some documentation for tests
2016-08-26 15:04:08 -07:00
James R. Barlow
d25397e2b0
Add test case for PDFs with masks and stencil masks
2016-08-26 15:03:27 -07:00
James R. Barlow
2025a096c3
Test case for stdin streaming
2016-08-25 14:46:54 -07:00
James R. Barlow
e5541e435c
New test to confirm we can emit JBIG2 with appropriate settings
2016-08-03 11:35:48 -07:00
James R. Barlow
e70387b1af
Add a simple test for image to PDF
2016-08-03 03:35:30 -07:00
James R. Barlow
91d715ac93
Add test cases for --output-type
2016-08-03 02:47:18 -07:00
James R. Barlow
fef35e4eb2
Fix handling of DPI for rare case of JPEG recompression after deskew/clean
...
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00
James R. Barlow
8f77576dc4
Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1
...
Tesseract renderer not immediately fixable.
2016-07-28 16:43:51 -07:00
James R. Barlow
16e4d342d2
Bug fix: --force-ocr should still run on pages with no images
...
Useful for people who want to reprocess text.
This also requires --oversample because DPI is undefined. To be fixed
in next commit.
2016-07-27 15:06:49 -07:00
James R. Barlow
b4a734fc0d
Test case for "algorithm 4" test
...
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00
James R. Barlow
ff092c8629
Fix race condition between these tests when run in parallel
2016-04-28 00:39:15 -07:00
James R. Barlow
40baab32ac
Remove dead code "import stuff in testcase"
2016-04-14 14:22:34 -07:00