2676 Commits

Author SHA1 Message Date
James R. Barlow
9a8ec4b210 optimize: only enable lossy JBIG2 for -O3 2018-10-03 00:38:58 -07:00
James R. Barlow
75aad4cc79 optimize: Refactor convert_to_jbig2 2018-10-02 23:42:12 -07:00
James R. Barlow
4b27feca98 optimize: Disable JBIG2 lossy mode, use lossless instead 2018-10-01 12:28:54 -07:00
James R. Barlow
45522cd15f weave: clarify comment about garbage data in ToC 2018-09-27 13:48:35 -07:00
James R. Barlow
677d9a4e76 Remove some unhelpful lambdas 2018-09-27 13:48:12 -07:00
James R. Barlow
efa7ea4fde Fix log.error where log is None v7.1.0 2018-09-19 23:01:27 -07:00
James R. Barlow
137a6e45f5 ghostscript: fix missing fspath for py3.5 2018-09-19 22:57:20 -07:00
James R. Barlow
29116e1dec Change to README.md 2018-09-19 21:01:24 -07:00
James R. Barlow
87193335b9 v7.1.0 notes 2018-09-19 20:57:18 -07:00
James R. Barlow
cfd4f8a850 Improve error handling for improvements to Ghostscript text extraction 2018-09-19 20:29:18 -07:00
James R. Barlow
eaa324939f Upgrade to pikepdf 0.3.3
Closes #231
2018-09-19 15:30:54 -07:00
James R. Barlow
ef70e538f7 Improve error message on handling KeyboardInterrupt
Closes #301
2018-09-19 01:40:26 -07:00
James R. Barlow
b7b912e56a Fix test suite and blank pages 2018-09-17 01:12:58 -07:00
James R. Barlow
4615cf2f1e First cut at improving text extraction speed 2018-09-16 23:34:18 -07:00
James R. Barlow
eaf772f80a Merge v6.2.4 release notes 2018-09-16 15:45:38 -07:00
James R. Barlow
96ba75eabd Ghostscript: fix issues in strict ASCII implementation 2018-09-16 15:41:54 -07:00
James R. Barlow
fdfe52c1ad main: add debug option to force threads 2018-09-15 00:01:45 -07:00
James R. Barlow
932b2e2a29 main: print Ghostscript version too 2018-09-14 23:58:06 -07:00
James R. Barlow
57e489c957 main: Cleanup; support overriding sys.args in run_pipeline 2018-09-14 23:57:35 -07:00
James R. Barlow
17a3fa671c ghostscript: API docs update 2018-09-14 23:51:52 -07:00
James R. Barlow
2659afb4f6 Cleanup gitignore 2018-09-14 21:02:22 -07:00
James R. Barlow
7392115507 Blacklist Ghostscript 9.24 due to regressions
As per issue #291. Forced push to remove a copyrighted test file that was
accidentally included.
v7.0.6
2018-09-14 20:41:13 -07:00
James R. Barlow
c54d0c7eaa v7.0.5 release notes v7.0.5 2018-09-13 23:29:54 -07:00
James R. Barlow
b95eefc65f Fix pikepdf version for Travis 2018-09-13 22:08:19 -07:00
James R. Barlow
686207ab7f Check for and reject Adobe LiveCycle Designer PDFs
These are the ones that display a "Please wait..." message.

Closes #296
2018-09-13 21:50:51 -07:00
James R. Barlow
517b385fe5 Work around loss of Unicode DOCINFO in Ghostscript 9.24+
Ghostscript no longer supports UTF-16-BE-hex strings as a way of
supplying Unicode data in pdfmark so we have lost this functionality too:
http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=e997c6836d243ab37fe3a5f0d57974af95eb5eac

For users this means setting --title, --author, etc. will not work if gs
9.24 is installed, but if the file has existing metadata it might work.

For now we enforce police-state-strict ASCII, until there's time to
implement proper metadata editing. Relevant tests set to xfail.
2018-09-13 21:33:39 -07:00
James R. Barlow
795019b0c1 Work around invalid TOC entries
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.

We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
3127a73822 Ghostscript: no need to specify ProcessColorModel when ColorConversionStrategy 2018-09-11 11:56:05 -07:00
James R. Barlow
069ee6c91f ghostscript: fix for 9.24 having jpeg passthrough available 2018-09-10 23:09:51 -07:00
James R. Barlow
3aac3a98ca tests: Migrate metadata tests to pikepdf
For some reason PyPDF2 has begun to trigger internal errors in
pytest on macOS alone. Not sure why, but nothing is wrong that I can
see. Seemed like an opportune time to switch to pikepdf; found some
new issues in the process anyway.
2018-09-10 16:06:01 -07:00
James R. Barlow
268859a304 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF: docs 2018-09-10 11:52:04 -07:00
James R. Barlow
a96710aa7b leptonica: update comments 2018-09-10 11:47:38 -07:00
James R. Barlow
edcc58826a pdfinfo: remove some dead code 2018-09-10 11:47:00 -07:00
James R. Barlow
7077c8220a Fix rst formatting in release notes 2018-09-10 11:46:17 -07:00
Mateus Seenem Tavares
f7cbf68edd Updating Arch Linux instalation (#288)
* Updating Arch Linux instalation

And adding a workaround to a wrong dependencies definition on https://aur.archlinux.org/packages/python-pikepdf/

* Remove comment about temporary workaround
2018-08-31 12:30:18 -07:00
James R. Barlow
68a58ee8a5 docs: fix hyperlinking of jbig2 page (again) and cleanup release notes 2018-08-27 01:25:30 -07:00
James R. Barlow
3109ec5091 v7.0.4 notes v7.0.4 2018-08-24 12:41:53 -07:00
James R. Barlow
e0599fe8d7 Require pikepdf 0.3.2 2018-08-24 12:41:43 -07:00
James R. Barlow
a749240589 docs: mention pikepdf install more clearly 2018-08-22 03:19:46 -07:00
James R. Barlow
6decdaa062 Try setuptools_scm_git_archive again 2018-08-20 15:45:51 -07:00
James R. Barlow
4d5c9b8cdf Fix error in optimize.py on PNGs at -O2
Error was
TypeError: unsupported operand type(s) for -: 'tuple' and 'int'
2018-08-20 15:45:34 -07:00
James R. Barlow
1e23ea5364 Remove pikepdf < 0.3 compatibility shims since > 0.3.1 is now required v7.0.3 2018-08-10 17:01:03 -07:00
James R. Barlow
cf9a8a91b5 Require pikepdf 0.3.1 2018-08-10 16:59:08 -07:00
James R. Barlow
05d3a65e94 docs: Fix links to JBIG2 encoder page
[ci skip]
2018-08-09 21:14:50 -07:00
James R. Barlow
c043552f8b Fix travis.yml syntax v7.0.2 2018-08-03 14:02:20 -07:00
James R. Barlow
487ee2b6c9 Notes for v7.0.2 2018-08-03 13:37:18 -07:00
James R. Barlow
8013fd50da Draw preview image at full resolution
As reported in #281 and confirmed by the test file in #279, downsampling
the preview adversely affects quality of image rotation especially for
small font sizes and marginal scans.

Full size gets rotation accuracy. This makes rotation a little inefficient
since it rasterizes twice - to be addressed later.
2018-08-03 13:32:10 -07:00
James R. Barlow
4ec9ec12e3 docs: Describe PDF optimization 2018-08-03 13:10:18 -07:00
James R. Barlow
ed96594727 Regroup installation page content around platforms
Also separate out JBIG2 encoder instructions so that distributions that
delete installation.rst won't omit this information.
2018-08-03 12:47:25 -07:00
James R. Barlow
91b7193249 Travis: use xenial for Python 3.7 2018-08-03 11:52:23 -07:00