2895 Commits

Author SHA1 Message Date
James R. Barlow
5bc5dc93f3 v7.2.0 release notes update v7.2.0 2018-10-05 01:27:00 -07:00
James R. Barlow
c1e18bb825 optimize: Exclude soft masks (SMasks) from optimization
Soft masks are only allowed to be of colorspace DeviceGray so we
shouldn't use pngquant on them. For now, avoid this exceptional
case by excluded soft masks from optimization.
2018-10-05 01:23:26 -07:00
James R. Barlow
58282ea0fb optimize: more refactoring
Now properly generalized/specialized where it should be
2018-10-04 13:44:51 -07:00
James R. Barlow
891da7834c optimize: refactor image extraction 2018-10-04 12:34:22 -07:00
James R. Barlow
5c229d48d5 optimize: Reorganize so JBIG2 can be performed on images reduced to 1bpp
Closes #297
2018-10-04 11:53:11 -07:00
James R. Barlow
53f660cf35 Travis: use newer macos image 2018-10-04 08:59:40 -07:00
James R. Barlow
7b66ca68f2 ...and document lossy JBIG2 2018-10-04 01:31:53 -07:00
James R. Barlow
ba71c3ffbd requirements: request pikepdf 0.3.4 2018-10-04 01:22:03 -07:00
James R. Barlow
6707ad427a v7.2.0 release notes 2018-10-04 01:21:17 -07:00
James R. Barlow
5b84549716 Change JBIG2 lossy mode to require --jbig2-lossy 2018-10-04 01:20:49 -07:00
James R. Barlow
c74f2ee6e8 Refactor the detailed error messages 2018-10-04 00:10:59 -07:00
James R. Barlow
b32dd9f9d3 Fix lossless JBIG2 when there are multiple JBIG2 images on a single page 2018-10-03 17:40:26 -07:00
James R. Barlow
fb8b161f6c Fix suppression of tesseract config error messages 2018-10-03 17:39:50 -07:00
James R. Barlow
baddd6d233 Remove libtiff from Brewfile
For some reason, brew complains about it now.
2018-10-03 16:17:59 -07:00
James R. Barlow
6f554c6ae8 tesseract: account for behavior changes when params are missing
Tesseract 4.0-rc1 now accepts invalid parameters in config and
won't return an error anymore. We prefer to raise an error if this
occurs.

See: 741ea00d70
2018-10-03 15:11:34 -07:00
James R. Barlow
a71e4488b3 test: fix pytest warning about direct use of a fixture 2018-10-03 15:04:46 -07:00
James R. Barlow
72156b5653 Degrade more gracefully when --optimize is set but JBIG2 is not present 2018-10-03 14:24:20 -07:00
James R. Barlow
9fa471e053 Test: send stderr to stderr, why don't we? 2018-10-03 14:23:34 -07:00
James R. Barlow
31ef2fe907 test: this error message changed case in newer Tesseract 2018-10-03 13:58:20 -07:00
James R. Barlow
9a8ec4b210 optimize: only enable lossy JBIG2 for -O3 2018-10-03 00:38:58 -07:00
James R. Barlow
75aad4cc79 optimize: Refactor convert_to_jbig2 2018-10-02 23:42:12 -07:00
James R. Barlow
4b27feca98 optimize: Disable JBIG2 lossy mode, use lossless instead 2018-10-01 12:28:54 -07:00
James R. Barlow
45522cd15f weave: clarify comment about garbage data in ToC 2018-09-27 13:48:35 -07:00
James R. Barlow
677d9a4e76 Remove some unhelpful lambdas 2018-09-27 13:48:12 -07:00
James R. Barlow
efa7ea4fde Fix log.error where log is None v7.1.0 2018-09-19 23:01:27 -07:00
James R. Barlow
137a6e45f5 ghostscript: fix missing fspath for py3.5 2018-09-19 22:57:20 -07:00
James R. Barlow
29116e1dec Change to README.md 2018-09-19 21:01:24 -07:00
James R. Barlow
87193335b9 v7.1.0 notes 2018-09-19 20:57:18 -07:00
James R. Barlow
cfd4f8a850 Improve error handling for improvements to Ghostscript text extraction 2018-09-19 20:29:18 -07:00
James R. Barlow
eaa324939f Upgrade to pikepdf 0.3.3
Closes #231
2018-09-19 15:30:54 -07:00
James R. Barlow
ef70e538f7 Improve error message on handling KeyboardInterrupt
Closes #301
2018-09-19 01:40:26 -07:00
James R. Barlow
b7b912e56a Fix test suite and blank pages 2018-09-17 01:12:58 -07:00
James R. Barlow
4615cf2f1e First cut at improving text extraction speed 2018-09-16 23:34:18 -07:00
James R. Barlow
eaf772f80a Merge v6.2.4 release notes 2018-09-16 15:45:38 -07:00
James R. Barlow
96ba75eabd Ghostscript: fix issues in strict ASCII implementation 2018-09-16 15:41:54 -07:00
James R. Barlow
fdfe52c1ad main: add debug option to force threads 2018-09-15 00:01:45 -07:00
James R. Barlow
932b2e2a29 main: print Ghostscript version too 2018-09-14 23:58:06 -07:00
James R. Barlow
57e489c957 main: Cleanup; support overriding sys.args in run_pipeline 2018-09-14 23:57:35 -07:00
James R. Barlow
17a3fa671c ghostscript: API docs update 2018-09-14 23:51:52 -07:00
James R. Barlow
2659afb4f6 Cleanup gitignore 2018-09-14 21:02:22 -07:00
James R. Barlow
7392115507 Blacklist Ghostscript 9.24 due to regressions
As per issue #291. Forced push to remove a copyrighted test file that was
accidentally included.
v7.0.6
2018-09-14 20:41:13 -07:00
James R. Barlow
c54d0c7eaa v7.0.5 release notes v7.0.5 2018-09-13 23:29:54 -07:00
James R. Barlow
b95eefc65f Fix pikepdf version for Travis 2018-09-13 22:08:19 -07:00
James R. Barlow
686207ab7f Check for and reject Adobe LiveCycle Designer PDFs
These are the ones that display a "Please wait..." message.

Closes #296
2018-09-13 21:50:51 -07:00
James R. Barlow
517b385fe5 Work around loss of Unicode DOCINFO in Ghostscript 9.24+
Ghostscript no longer supports UTF-16-BE-hex strings as a way of
supplying Unicode data in pdfmark so we have lost this functionality too:
http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=e997c6836d243ab37fe3a5f0d57974af95eb5eac

For users this means setting --title, --author, etc. will not work if gs
9.24 is installed, but if the file has existing metadata it might work.

For now we enforce police-state-strict ASCII, until there's time to
implement proper metadata editing. Relevant tests set to xfail.
2018-09-13 21:33:39 -07:00
James R. Barlow
795019b0c1 Work around invalid TOC entries
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.

We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
3127a73822 Ghostscript: no need to specify ProcessColorModel when ColorConversionStrategy 2018-09-11 11:56:05 -07:00
James R. Barlow
069ee6c91f ghostscript: fix for 9.24 having jpeg passthrough available 2018-09-10 23:09:51 -07:00
James R. Barlow
3aac3a98ca tests: Migrate metadata tests to pikepdf
For some reason PyPDF2 has begun to trigger internal errors in
pytest on macOS alone. Not sure why, but nothing is wrong that I can
see. Seemed like an opportune time to switch to pikepdf; found some
new issues in the process anyway.
2018-09-10 16:06:01 -07:00
James R. Barlow
268859a304 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF: docs 2018-09-10 11:52:04 -07:00