James R. Barlow
adc1580742
Help py.test collect output in more cases
2016-12-08 16:21:07 -08:00
James R. Barlow
e57aa0eee2
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”
...
And add new test case for this.
2016-12-08 16:06:53 -08:00
James R. Barlow
731e6792c7
Add test cases for Ghostscript PDF/A warnings
2016-12-03 00:32:09 -08:00
James R. Barlow
949d2ff1c2
v4.3.1 release notes
2016-11-07 14:36:08 -08:00
James R. Barlow
1c8b763d53
test_pageinfo: Remove bits per component test
...
The behavior of this test will ultimately depend on what version of
img2pdf is installed, since after my patch it will be able to produce
1bpp images.
2016-11-07 14:35:54 -08:00
James R. Barlow
bb91393b85
Fix “deskew-rotate” bug.
...
Turns out this occurred in any case where pdf-renderer hocr was used
and a tesseract timeout or error occurred. We created a replacement
page based on the unrotated page dimensions instead of the input image’s
dimensions.
2016-11-07 14:17:31 -08:00
James R. Barlow
cc9c0d819e
Add test case for documents that get rotated incorrectly after deskew
2016-11-07 14:15:03 -08:00
James R. Barlow
fdd9b8b8ce
Optimize some of the test resources to reduce file sizes
...
Mostly by reducing RGB -> monochrome and applying JBIG2 compression
2016-11-07 14:01:23 -08:00
James R. Barlow
a4f07756a5
tesseract caching: don't transcode tesseract's output, hash source file
...
For sanity's sake, deal with tesseract streams in binary without
transcoding (via universal_newlines, etc.). The only differences are
printing messages regarding spoofing.
Also hash the source file so that changes to the cache mechanism
invalidate old cache automatically. That is probably too aggressive,
but simple and safer than the previous approach.
2016-10-28 16:44:12 -07:00
James R. Barlow
2e4431cc63
Allow piping output to stdout
2016-10-27 16:14:42 -07:00
James R. Barlow
f7387b0859
test_stdin: simplify this test
...
No need to involve 'cat', just hook the file up to stdin.
2016-10-27 16:01:07 -07:00
James R. Barlow
a09f6b8977
Test cases: check that stdout is clear of output
...
To ensure piping to stdout is possible.
2016-10-27 15:58:24 -07:00
James R. Barlow
a86805f0d9
Remove possibly non-free page from "multipage.pdf"
2016-10-27 15:56:43 -07:00
James R. Barlow
7eca8508fd
Implement new preprocessing feature, background removal
2016-10-14 17:23:34 -07:00
James R. Barlow
cf4b04f92d
The main 'quick' test should be a file that OCRs to recognizable text
2016-10-07 16:25:34 -07:00
James R. Barlow
013c5a369f
Replace redacted file with an OCR-able file
2016-10-07 12:45:22 -07:00
James R. Barlow
6baf8668a6
Replace with non-free file milk.pdf with free equivalent
2016-10-06 13:10:28 -07:00
James R. Barlow
4ba2962c56
Comment on non-free files
2016-10-05 16:48:16 -07:00
James R. Barlow
7ad92f5db4
Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF
2016-10-05 16:39:00 -07:00
James R. Barlow
4dad09cc91
resources/README: replace the other large table with a list table
2016-10-05 16:38:51 -07:00
Sean Whitton
7f08f15fc9
pytest skipif for milk.pdf test ( #95 )
...
Skip the test if the fair use restricted milk.pdf is not present.
2016-09-15 08:55:31 -07:00
James R. Barlow
825c0f8b2a
Note that milk.pdf is non-free, start using list-tables
2016-09-10 14:44:00 -07:00
James R. Barlow
9ca29c787b
Update description of masks.pdf to reflect what it actually tests
2016-09-01 21:21:14 -07:00
James R. Barlow
bd534c3313
main.py -> __main__.py
...
Executing a package with python -m packagename will check for
__main__.py inside the package. In other words main.py should have
always been named __main__.py.
In the unlikely event that someone depends on "import ocrmypdf.main"
being meaningful, main.py continues to exist and replicates the
behavior of __main__. (It's unlikely because import ocrmypdf.main does
unpythonic ruffus-related things at things import time, essentially
configuring itself to work with sys.argv. To fix another day.)
This should solve the problem of Debian needing to run test suites
before installation and afterwards for continuous integration without
having to patch either file, as python -m ocrmypdf will follow import
order. That is, if the current directory contains "ocrmypdf/" (e.g.
staging a new version) then that will be tested, else sys.path will
be checked.
2016-08-31 17:01:42 -07:00
James R. Barlow
bf89e38c69
Add milk.pdf test case
2016-08-31 11:42:21 -07:00
James R. Barlow
325cc0beca
Allow test cases to run without installing first
...
As @spwhitton found:
The test suite needs to call "python3 -m ocrmypdf.main" instead of
just "ocrmypdf" because this /usr/bin/ocrmypdf script has not yet been
generated when dh runs the test suite.
---
Seems reasonable to perform in-place testing independent of installation.
Source:
https://sources.debian.net/src/ocrmypdf/4.2.1%2Bgit.20160824.1.5d67cc7-1/debian/patches/0001-patch-test-suite-executable.patch/
2016-08-26 15:23:26 -07:00
James R. Barlow
1a9f09c4d5
Remove OCRmyPDF.sh and its usage in all test cases
2016-08-26 15:18:38 -07:00
James R. Barlow
4fed4e2af3
tests: don't try to pass Unicode arguments on command line on Linux
...
Depends on locale being configured properly, and it's not necessary
to be able to do this.
2016-08-26 15:08:56 -07:00
James R. Barlow
cc7e328358
Improve some documentation for tests
2016-08-26 15:04:08 -07:00
James R. Barlow
d25397e2b0
Add test case for PDFs with masks and stencil masks
2016-08-26 15:03:27 -07:00
James R. Barlow
2025a096c3
Test case for stdin streaming
2016-08-25 14:46:54 -07:00
James R. Barlow
e5541e435c
New test to confirm we can emit JBIG2 with appropriate settings
2016-08-03 11:35:48 -07:00
James R. Barlow
e70387b1af
Add a simple test for image to PDF
2016-08-03 03:35:30 -07:00
James R. Barlow
91d715ac93
Add test cases for --output-type
2016-08-03 02:47:18 -07:00
James R. Barlow
fef35e4eb2
Fix handling of DPI for rare case of JPEG recompression after deskew/clean
...
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00
James R. Barlow
8f77576dc4
Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1
...
Tesseract renderer not immediately fixable.
2016-07-28 16:43:51 -07:00
James R. Barlow
16e4d342d2
Bug fix: --force-ocr should still run on pages with no images
...
Useful for people who want to reprocess text.
This also requires --oversample because DPI is undefined. To be fixed
in next commit.
2016-07-27 15:06:49 -07:00
jbarlow83
1bacf35a2c
Update license information for encrypted_algo4.pdf
2016-06-24 14:25:15 -07:00
James R. Barlow
b4a734fc0d
Test case for "algorithm 4" test
...
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00
James R. Barlow
ff092c8629
Fix race condition between these tests when run in parallel
2016-04-28 00:39:15 -07:00
James R. Barlow
40baab32ac
Remove dead code "import stuff in testcase"
2016-04-14 14:22:34 -07:00
James R. Barlow
e877d37ac8
--rotate-pages: Only apply rotation if we're reasonable confident
...
Take the threshold from tesseract's default value for -psm 1.
2016-04-14 13:49:44 -07:00
James R. Barlow
322085933b
unpaper: fix check for missing and old versions, add test case
2016-03-10 15:37:09 -08:00
James R. Barlow
f3e06b2dbd
Add bookmarks to file for more testing
2016-02-29 00:05:07 -08:00
James R. Barlow
570bbe9a05
Add comments and remove debugging, improve inline handling
...
Squashed commits:
[bfff3c9] pageinfo, have a main()
2016-02-27 00:18:36 -08:00
James R. Barlow
5cc3adb39a
Add support for inline images
2016-02-27 00:18:36 -08:00
James R. Barlow
3957a0606c
Compute image pixel density without performing rectangle intersection (+5 squashed commits)
...
Squashed commits:
[0e27904] Partially implement DPI calculation with rotation of the image
Fixes test suite
[a64f662] pageinfo: all tests pass
[c5b811a] Fix typos
[cdd2286] Can now find inline images for efficiently
[60dde8d] First cut at implementing intelligent DPI detection based on content stream
Broke many of the test cases
2016-02-27 00:18:36 -08:00
James R. Barlow
7c5e58a497
Fix test cases that break in Docker, improve test for running in Docker
2016-02-20 23:47:37 -08:00
James R. Barlow
323b9a5f8e
Add other missing files
2016-02-20 05:34:21 -08:00
James R. Barlow
cab381a339
Add JPEG 2000 test case
2016-02-20 05:13:19 -08:00