James R. Barlow
44f47fba21
PDF/A: handle case of no XMP metadata gracefully
2016-08-03 02:57:25 -07:00
James R. Barlow
02584094a1
Suppress NUL bytes in metadata from input files
2016-08-03 02:47:44 -07:00
James R. Barlow
91d715ac93
Add test cases for --output-type
2016-08-03 02:47:18 -07:00
James R. Barlow
35addb8a33
Complain if Chinese is requested with settings known to not work
...
Should extend test for other Asian languages
2016-08-03 01:29:12 -07:00
James R. Barlow
d32ea8d0dd
Remove dead code from qpdf merge + PyPDF2 metadata patching
...
I tried "qpdf merge + PyPDF2 metadata patching" first. The problem is
that PyPDF2 produces a 1.3 by default and generally I have less
confidence it.
New approach is to stuff the Document Info metadata in the first page
with PyPdf2, cross fingers and use qpdf to merge. It's not quite as
clean and might harm the first page, but it's better than shipping
files produced by PyPDF2.
2016-08-03 01:28:27 -07:00
James R. Barlow
12575d594a
Improve PDF/A validity checking at end
2016-08-03 01:26:16 -07:00
James R. Barlow
0746083301
Fix failing test case - unbound local variable in finally block
2016-08-03 01:00:38 -07:00
James R. Barlow
5c99acf6d1
Experimental change to use qpdf to merge files (disables Ghostscript)
...
All but one tests pass, test_input_file_not_a_pdf
Not sure if PyPDF2 metadata generation will mangle the first page.
2016-08-03 00:56:44 -07:00
James R. Barlow
2b10df7b74
leptonica: note about when it may be safe to drop <1.72 workaround
2016-08-03 00:54:37 -07:00
James R. Barlow
ebe68de4ff
Functional qpdfmerge with PyPDF2 for DocumentInfo block
...
Tests mostly passing. For the moment this is the new default.
Although PyPDF2 produces a PDF-1.3 which will be wrong for some contents
and possible should be repaired with qpdf. Again.
Looks like it could work better to merge PyPDF2 and fix everything
with qpdf.
2016-08-02 16:48:13 -07:00
James R. Barlow
b17c6a146d
Experimental qpdf merging
...
Does not copy /Catalog metadata, but otherwise functional
2016-08-02 02:19:02 -07:00
James R. Barlow
46d837c866
Clarify trusty/precise stuff
2016-08-02 01:29:33 -07:00
James R. Barlow
24856b61e4
Fix typo in readme
2016-08-02 01:29:22 -07:00
James R. Barlow
8d0c6ff616
pyvenv -> python3 -m venv
...
Sadly the Python developers are removing this script
2016-08-02 01:27:50 -07:00
James R. Barlow
0b24f971cd
ocrmyimage: complain about ICC profiles being presumed
2016-08-02 01:22:36 -07:00
James R. Barlow
bc5d3824bd
Don't overload --oversample, use --image-dpi instead for images
2016-07-31 02:09:30 -07:00
James R. Barlow
4356983707
Suppress overly long stack traces on traverse_ruffus_exception
2016-07-31 02:06:44 -07:00
James R. Barlow
2414b79ee6
More cleanup of exception related errors
2016-07-31 01:48:13 -07:00
James R. Barlow
968e1546f0
Refactor image file triage
2016-07-31 01:47:57 -07:00
James R. Barlow
48213c9c3f
Update release notes and readme
2016-07-29 15:25:16 -07:00
James R. Barlow
f385772d21
Refactor "is this an iterable that's not a string?" test
2016-07-29 15:25:02 -07:00
James R. Barlow
d257c83520
Most tests were failing at split_pages()
...
It seems that ruffus sometimes decides to send a ['inputfile.pdf']
instead of a bare string.
2016-07-29 14:59:17 -07:00
James R. Barlow
7b72ffec4f
ocrmyimage: better handling of missing/invalid DPI
2016-07-29 14:38:07 -07:00
James R. Barlow
757f6826dc
ocrmyimage - Attempt conversion to PDF if input file is not a PDF
...
First cut.
May have broken ruffus errors again too.
2016-07-29 14:03:19 -07:00
James R. Barlow
5df83a0d30
Travis: use Python 3.5 too
2016-07-29 13:31:40 -07:00
James R. Barlow
d70e3d3753
ruffus exceptions: for clarity only, don't iterate strings
...
It's a good habit to ensure any iterator test is explicit about
allowing or disallowing strings.
2016-07-29 13:31:24 -07:00
James R. Barlow
0dfceedcfb
Remove old OCRmyPDF 2.x from release notes; update 4.2 notes
2016-07-29 03:08:59 -07:00
James R. Barlow
2c30f4bfc5
Travis: build partly working on trusty; tweak requirements again
...
The build is #122
https://travis-ci.org/jbarlow83/OCRmyPDF/builds/148255615
Errors seem to be related to either Ghostscript or leptonica? Maybe
-dSAFER?
2016-07-29 03:08:01 -07:00
James R. Barlow
9e7fb52b47
Travis: add PPA to support unpaper
2016-07-29 01:57:12 -07:00
James R. Barlow
bb5fd38e38
Remove additional PPA's and try again
2016-07-29 01:47:56 -07:00
James R. Barlow
7c8cf5cfa2
Try travis-trusty
...
This removes some backports for packages that Ubuntu trusty offers but
for which Ubuntu precise needed help.
2016-07-29 01:44:57 -07:00
James R. Barlow
fef35e4eb2
Fix handling of DPI for rare case of JPEG recompression after deskew/clean
...
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00
James R. Barlow
8f77576dc4
Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1
...
Tesseract renderer not immediately fixable.
2016-07-28 16:43:51 -07:00
James R. Barlow
b3fcf24a26
Refactor DPI: fix regressions in test suite
...
Some called functions are particular about the data format of DPI and
don't like to deal with the Decimal() returned by PyPDF2. Convert to
float and int where needed.
2016-07-28 00:19:32 -07:00
James R. Barlow
16e4d342d2
Bug fix: --force-ocr should still run on pages with no images
...
Useful for people who want to reprocess text.
This also requires --oversample because DPI is undefined. To be fixed
in next commit.
2016-07-27 15:06:49 -07:00
James R. Barlow
8458a51860
Tighten requirements and dependencies
2016-07-27 14:47:59 -07:00
James R. Barlow
636d1903b3
Ghostscript: do raster output with -dSAFER
...
-dSAFER does not work when rendering PDF/A, because that needs to load
the ICC file, and -dSAFER prevents access to external files.
2016-07-27 00:54:40 -07:00
jbarlow83
514efa36fc
Readme: Add table of contents, brew install tesseract --with-language packs
v4.1.4
2016-07-24 11:21:46 -07:00
James R. Barlow
bd48f40d3d
v4.1.4 release notes
v4.1.4rc1
2016-07-17 00:35:06 -07:00
James R. Barlow
c02dbc809a
Merge commit '68cf9cbd87c188823027f9d1bfe9029017e7281f' into develop
2016-07-17 00:29:48 -07:00
James R. Barlow
410111d6fb
Bug fix: Monochrome images with ICC treated as full color images
...
Issue #79 .
User submitted PDF with ICC profile attached to the monochrome image
in the input file, which is not common but useful for PDFs that want to
define how light the paper is or how dark the black is. The code was
written to assume unusual images are full color unless it can prove
otherwise. Handle this simple case. Other ICC cases should be tested.
2016-07-17 00:29:32 -07:00
jbarlow83
68cf9cbd87
.rst: add code-block markup
2016-07-05 14:03:55 -07:00
jbarlow83
c9b2540d9d
Fix some .rst formatting errors
2016-07-05 13:48:19 -07:00
jbarlow83
1bacf35a2c
Update license information for encrypted_algo4.pdf
2016-06-24 14:25:15 -07:00
jbarlow83
8aef0d9277
Merge pull request #76 from Jmuccigr/patch-2
...
Adding explicit reference to help
2016-06-24 14:21:23 -07:00
John Muccigrosso
b2fa8645ba
Adding explicit reference to help
2016-06-24 13:44:12 -05:00
James R. Barlow
c96823a648
v4.1.3 release notes
v4.1.3
v4.1.3rc1
2016-06-23 13:47:56 -07:00
James R. Barlow
3807b7d655
Merge branch 'feature/leptfun' into develop
2016-06-23 13:45:35 -07:00
James R. Barlow
a45505cf1d
Fix order of operations in matrix multiplication
...
Issue #73 . The order of operations happens to not matter for scaling
but does matter for translation. We only need scaling to find the DPI,
so the error was not noticed. Mainly useful to other uses of this
library.
2016-06-23 13:36:23 -07:00
James R. Barlow
b4a734fc0d
Test case for "algorithm 4" test
...
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00