943 Commits

Author SHA1 Message Date
James R. Barlow
cc7e328358 Improve some documentation for tests 2016-08-26 15:04:08 -07:00
James R. Barlow
d25397e2b0 Add test case for PDFs with masks and stencil masks 2016-08-26 15:03:27 -07:00
James R. Barlow
bc11454e1c Help text: example of shell pipeline with img2pdf v4.2.2 2016-08-25 14:58:25 -07:00
James R. Barlow
2025a096c3 Test case for stdin streaming 2016-08-25 14:46:54 -07:00
James R. Barlow
38fe14b108 Make final PDF/A output message less obtuse 2016-08-25 14:46:40 -07:00
James R. Barlow
1b7b2f3695 v4.2.2 release notes, documentation improvements 2016-08-25 14:46:09 -07:00
James R. Barlow
5d67cc76cc Update 4.2.1 release notes 2016-08-24 14:16:22 -07:00
James R. Barlow
27a3813207 Recover input filename from symlink on error message
The recent commit to accept files from stdin broken the feature of
returning the input filename on an error, returning the temp filename
instead, which is confusing.
v4.2.1
2016-08-23 17:38:28 -07:00
James R. Barlow
b06e0bfdcd Merge branch 'develop' 2016-08-23 16:03:07 -07:00
James R. Barlow
d616f25324 Implement DPI checking for stencil masks 2016-08-23 15:59:34 -07:00
James R. Barlow
b03028e31f setup.py -> license is MIT 2016-08-19 10:14:33 -07:00
James R. Barlow
e08c42fd3d Tweak pipeline again 2016-08-09 22:40:29 -07:00
James R. Barlow
16901f7134 Accept input from stdin if input filename is '-' 2016-08-09 15:46:24 -07:00
James R. Barlow
dffceedd85 Update the pipeline image 2016-08-09 15:45:19 -07:00
James R. Barlow
e5541e435c New test to confirm we can emit JBIG2 with appropriate settings 2016-08-03 11:35:48 -07:00
James R. Barlow
b969aad67b Tweak release notes v4.2 v4.2rc1 2016-08-03 03:36:45 -07:00
James R. Barlow
e70387b1af Add a simple test for image to PDF 2016-08-03 03:35:30 -07:00
James R. Barlow
44f47fba21 PDF/A: handle case of no XMP metadata gracefully 2016-08-03 02:57:25 -07:00
James R. Barlow
02584094a1 Suppress NUL bytes in metadata from input files 2016-08-03 02:47:44 -07:00
James R. Barlow
91d715ac93 Add test cases for --output-type 2016-08-03 02:47:18 -07:00
James R. Barlow
35addb8a33 Complain if Chinese is requested with settings known to not work
Should extend test for other Asian languages
2016-08-03 01:29:12 -07:00
James R. Barlow
d32ea8d0dd Remove dead code from qpdf merge + PyPDF2 metadata patching
I tried "qpdf merge + PyPDF2 metadata patching" first. The problem is
that PyPDF2 produces a 1.3 by default and generally I have less
confidence it.

New approach is to stuff the Document Info metadata in the first page
with PyPdf2, cross fingers and use qpdf to merge. It's not quite as
clean and might harm the first page, but it's better than shipping
files produced by PyPDF2.
2016-08-03 01:28:27 -07:00
James R. Barlow
12575d594a Improve PDF/A validity checking at end 2016-08-03 01:26:16 -07:00
James R. Barlow
0746083301 Fix failing test case - unbound local variable in finally block 2016-08-03 01:00:38 -07:00
James R. Barlow
5c99acf6d1 Experimental change to use qpdf to merge files (disables Ghostscript)
All but one tests pass, test_input_file_not_a_pdf

Not sure if PyPDF2 metadata generation will mangle the first page.
2016-08-03 00:56:44 -07:00
James R. Barlow
2b10df7b74 leptonica: note about when it may be safe to drop <1.72 workaround 2016-08-03 00:54:37 -07:00
James R. Barlow
ebe68de4ff Functional qpdfmerge with PyPDF2 for DocumentInfo block
Tests mostly passing. For the moment this is the new default.

Although PyPDF2 produces a PDF-1.3 which will be wrong for some contents
and possible should be repaired with qpdf. Again.

Looks like it could work better to merge PyPDF2 and fix everything
with qpdf.
2016-08-02 16:48:13 -07:00
James R. Barlow
b17c6a146d Experimental qpdf merging
Does not copy /Catalog metadata, but otherwise functional
2016-08-02 02:19:02 -07:00
James R. Barlow
46d837c866 Clarify trusty/precise stuff 2016-08-02 01:29:33 -07:00
James R. Barlow
24856b61e4 Fix typo in readme 2016-08-02 01:29:22 -07:00
James R. Barlow
8d0c6ff616 pyvenv -> python3 -m venv
Sadly the Python developers are removing this script
2016-08-02 01:27:50 -07:00
James R. Barlow
0b24f971cd ocrmyimage: complain about ICC profiles being presumed 2016-08-02 01:22:36 -07:00
James R. Barlow
bc5d3824bd Don't overload --oversample, use --image-dpi instead for images 2016-07-31 02:09:30 -07:00
James R. Barlow
4356983707 Suppress overly long stack traces on traverse_ruffus_exception 2016-07-31 02:06:44 -07:00
James R. Barlow
2414b79ee6 More cleanup of exception related errors 2016-07-31 01:48:13 -07:00
James R. Barlow
968e1546f0 Refactor image file triage 2016-07-31 01:47:57 -07:00
James R. Barlow
48213c9c3f Update release notes and readme 2016-07-29 15:25:16 -07:00
James R. Barlow
f385772d21 Refactor "is this an iterable that's not a string?" test 2016-07-29 15:25:02 -07:00
James R. Barlow
d257c83520 Most tests were failing at split_pages()
It seems that ruffus sometimes decides to send a ['inputfile.pdf']
instead of a bare string.
2016-07-29 14:59:17 -07:00
James R. Barlow
7b72ffec4f ocrmyimage: better handling of missing/invalid DPI 2016-07-29 14:38:07 -07:00
James R. Barlow
757f6826dc ocrmyimage - Attempt conversion to PDF if input file is not a PDF
First cut.

May have broken ruffus errors again too.
2016-07-29 14:03:19 -07:00
James R. Barlow
5df83a0d30 Travis: use Python 3.5 too 2016-07-29 13:31:40 -07:00
James R. Barlow
d70e3d3753 ruffus exceptions: for clarity only, don't iterate strings
It's a good habit to ensure any iterator test is explicit about
allowing or disallowing strings.
2016-07-29 13:31:24 -07:00
James R. Barlow
0dfceedcfb Remove old OCRmyPDF 2.x from release notes; update 4.2 notes 2016-07-29 03:08:59 -07:00
James R. Barlow
2c30f4bfc5 Travis: build partly working on trusty; tweak requirements again
The build is #122
https://travis-ci.org/jbarlow83/OCRmyPDF/builds/148255615

Errors seem to be related to either Ghostscript or leptonica? Maybe
-dSAFER?
2016-07-29 03:08:01 -07:00
James R. Barlow
9e7fb52b47 Travis: add PPA to support unpaper 2016-07-29 01:57:12 -07:00
James R. Barlow
bb5fd38e38 Remove additional PPA's and try again 2016-07-29 01:47:56 -07:00
James R. Barlow
7c8cf5cfa2 Try travis-trusty
This removes some backports for packages that Ubuntu trusty offers but
for which Ubuntu precise needed help.
2016-07-29 01:44:57 -07:00
James R. Barlow
fef35e4eb2 Fix handling of DPI for rare case of JPEG recompression after deskew/clean
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00
James R. Barlow
8f77576dc4 Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1
Tesseract renderer not immediately fixable.
2016-07-28 16:43:51 -07:00