James R. Barlow
4a27124eab
Simplify metadata for invalid xml in output
...
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
0c0d53b10f
tests: AcroForm test case did not work correctly; fixed
2019-12-30 17:50:32 -08:00
James R. Barlow
c5571388e2
Improve test coverage of _sync.py
2019-12-10 01:06:27 -08:00
James R. Barlow
5e2a7f8a56
tests: speed up several slow tests
2019-12-09 16:17:57 -08:00
James R. Barlow
0a72c12ff0
weave: add new test for link consistency
2019-05-12 03:36:33 -07:00
James R. Barlow
f34b3015b2
Prevent Ghostscript from generating invalid XMP metadata
...
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
9e6b54c7ed
Add test case for Type3 fonts with no Unicode mapping
2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f
Test case: true type font without Unicode mapping
2018-11-15 16:22:53 -08:00
James R. Barlow
686207ab7f
Check for and reject Adobe LiveCycle Designer PDFs
...
These are the ones that display a "Please wait..." message.
Closes #296
2018-09-13 21:50:51 -07:00
James R. Barlow
795019b0c1
Work around invalid TOC entries
...
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.
We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
c171cb7286
Merge img2pdf 0.3.0 fix from v6.2.3
2018-08-01 15:17:33 -07:00
James R. Barlow
1d09061130
Revert previous commit amd reject input images with alpha channel
...
Decided on this for simplicity of old release branch.
Modifies baiona.png by stripping
alpha, adds baiona_alpha which
includes the alpha.
2018-07-31 23:45:28 -07:00
James R. Barlow
ed8ff79e10
Optimize some of our bigger test files
...
Only partially optimize multipage.pdf so that it hopefully
improves speed of test suite without being useless as an
optimization test.
2018-06-29 00:35:49 -07:00
James R. Barlow
9637696a54
Fix test resources naming inconsistency
2018-06-28 23:37:14 -07:00
James R. Barlow
02b3ca6862
Compress test images more heavily
2018-06-28 21:40:12 -07:00
James R. Barlow
2131ad4670
Fix --remove-background error on PDFs with colormapped images
...
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.
Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
7368399f8b
Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254
2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a
Fix list table for tests/resources
...
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
4f6bffb477
Update copyrights
2018-03-31 11:54:38 -07:00
James R. Barlow
45dbff6401
Fix table of contents not preserved in PDF/A
2018-03-26 02:23:19 -07:00
James R. Barlow
6756016572
Add license notice to all files
...
Source files to GPL3
Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py
Test resources to CC BY-SA 4.0 except when otherwise noted.
Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
74ca736333
Issue #223 : improve text of encrypted PDF error message
2018-02-27 15:08:22 -08:00
James R. Barlow
a9da839c39
Add vector-only PDF test case
2018-02-08 00:17:35 -08:00
James R. Barlow
3a167af2c4
Nearly smallest possible PDF-1.3 with all required fields
2017-11-26 23:32:21 -08:00
James R. Barlow
965de3a235
Test case for issue #200
2017-11-26 22:52:53 -08:00
James R. Barlow
34fc1f5fd7
Add reminder that blank.pdf is not trivial
2017-09-13 01:19:18 -07:00
James R. Barlow
d04e43d46d
Update copyright info for test files
...
[ci skip]
2017-09-01 01:00:32 -07:00
James R. Barlow
52483072dc
Add a differential test that checks tesseract uses supplied word list
2017-07-21 16:40:20 -07:00
James R. Barlow
4b5cd420e1
Add new test file
2017-05-29 12:16:08 -07:00
James R. Barlow
21982cf1cb
baiona_gray remove alpha channel
2017-05-11 23:23:37 -07:00
James R. Barlow
edc01408da
Update the .png files, again, hopefully without corruption
2017-05-11 23:20:50 -07:00
James R. Barlow
bf04f03c4c
Fix corrupt test file “typewriter.png”
...
This file is not currently used in any tests, but could be, so replace
corrupt version with a useful one.
2017-05-06 22:28:34 -07:00
James R. Barlow
93e802f473
Fix issue #163 , color and grayscale images JPEG compressed when not needed
2017-05-06 22:27:25 -07:00
James R. Barlow
aa859a4139
Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents
2017-05-01 15:46:15 -07:00
James R. Barlow
d1a0065ef8
Create test case for Form XObjects
2017-02-14 12:51:15 -08:00
James R. Barlow
1976dc6f30
Fix issue #121 “pop from empty list” (content stream parsing error)
2017-01-26 17:24:40 -08:00
James R. Barlow
097a69d07f
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”
...
And add new test case for this.
2016-12-08 16:04:14 -08:00
James R. Barlow
949d2ff1c2
v4.3.1 release notes
2016-11-07 14:36:08 -08:00
James R. Barlow
cc9c0d819e
Add test case for documents that get rotated incorrectly after deskew
2016-11-07 14:15:03 -08:00
James R. Barlow
fdd9b8b8ce
Optimize some of the test resources to reduce file sizes
...
Mostly by reducing RGB -> monochrome and applying JBIG2 compression
2016-11-07 14:01:23 -08:00
James R. Barlow
a86805f0d9
Remove possibly non-free page from "multipage.pdf"
2016-10-27 15:56:43 -07:00
James R. Barlow
013c5a369f
Replace redacted file with an OCR-able file
2016-10-07 12:45:22 -07:00
James R. Barlow
6baf8668a6
Replace with non-free file milk.pdf with free equivalent
2016-10-06 13:10:28 -07:00
James R. Barlow
4ba2962c56
Comment on non-free files
2016-10-05 16:48:16 -07:00
James R. Barlow
4dad09cc91
resources/README: replace the other large table with a list table
2016-10-05 16:38:51 -07:00
James R. Barlow
825c0f8b2a
Note that milk.pdf is non-free, start using list-tables
2016-09-10 14:44:00 -07:00
James R. Barlow
9ca29c787b
Update description of masks.pdf to reflect what it actually tests
2016-09-01 21:21:14 -07:00
James R. Barlow
bf89e38c69
Add milk.pdf test case
2016-08-31 11:42:21 -07:00
James R. Barlow
d25397e2b0
Add test case for PDFs with masks and stencil masks
2016-08-26 15:03:27 -07:00
James R. Barlow
fef35e4eb2
Fix handling of DPI for rare case of JPEG recompression after deskew/clean
...
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00