James R. Barlow
5bc5dc93f3
v7.2.0 release notes update
v7.2.0
2018-10-05 01:27:00 -07:00
James R. Barlow
c1e18bb825
optimize: Exclude soft masks (SMasks) from optimization
...
Soft masks are only allowed to be of colorspace DeviceGray so we
shouldn't use pngquant on them. For now, avoid this exceptional
case by excluded soft masks from optimization.
2018-10-05 01:23:26 -07:00
James R. Barlow
58282ea0fb
optimize: more refactoring
...
Now properly generalized/specialized where it should be
2018-10-04 13:44:51 -07:00
James R. Barlow
891da7834c
optimize: refactor image extraction
2018-10-04 12:34:22 -07:00
James R. Barlow
5c229d48d5
optimize: Reorganize so JBIG2 can be performed on images reduced to 1bpp
...
Closes #297
2018-10-04 11:53:11 -07:00
James R. Barlow
53f660cf35
Travis: use newer macos image
2018-10-04 08:59:40 -07:00
James R. Barlow
7b66ca68f2
...and document lossy JBIG2
2018-10-04 01:31:53 -07:00
James R. Barlow
ba71c3ffbd
requirements: request pikepdf 0.3.4
2018-10-04 01:22:03 -07:00
James R. Barlow
6707ad427a
v7.2.0 release notes
2018-10-04 01:21:17 -07:00
James R. Barlow
5b84549716
Change JBIG2 lossy mode to require --jbig2-lossy
2018-10-04 01:20:49 -07:00
James R. Barlow
c74f2ee6e8
Refactor the detailed error messages
2018-10-04 00:10:59 -07:00
James R. Barlow
b32dd9f9d3
Fix lossless JBIG2 when there are multiple JBIG2 images on a single page
2018-10-03 17:40:26 -07:00
James R. Barlow
fb8b161f6c
Fix suppression of tesseract config error messages
2018-10-03 17:39:50 -07:00
James R. Barlow
baddd6d233
Remove libtiff from Brewfile
...
For some reason, brew complains about it now.
2018-10-03 16:17:59 -07:00
James R. Barlow
6f554c6ae8
tesseract: account for behavior changes when params are missing
...
Tesseract 4.0-rc1 now accepts invalid parameters in config and
won't return an error anymore. We prefer to raise an error if this
occurs.
See: 741ea00d70
2018-10-03 15:11:34 -07:00
James R. Barlow
a71e4488b3
test: fix pytest warning about direct use of a fixture
2018-10-03 15:04:46 -07:00
James R. Barlow
72156b5653
Degrade more gracefully when --optimize is set but JBIG2 is not present
2018-10-03 14:24:20 -07:00
James R. Barlow
9fa471e053
Test: send stderr to stderr, why don't we?
2018-10-03 14:23:34 -07:00
James R. Barlow
31ef2fe907
test: this error message changed case in newer Tesseract
2018-10-03 13:58:20 -07:00
James R. Barlow
9a8ec4b210
optimize: only enable lossy JBIG2 for -O3
2018-10-03 00:38:58 -07:00
James R. Barlow
75aad4cc79
optimize: Refactor convert_to_jbig2
2018-10-02 23:42:12 -07:00
James R. Barlow
4b27feca98
optimize: Disable JBIG2 lossy mode, use lossless instead
2018-10-01 12:28:54 -07:00
James R. Barlow
45522cd15f
weave: clarify comment about garbage data in ToC
2018-09-27 13:48:35 -07:00
James R. Barlow
677d9a4e76
Remove some unhelpful lambdas
2018-09-27 13:48:12 -07:00
James R. Barlow
efa7ea4fde
Fix log.error where log is None
v7.1.0
2018-09-19 23:01:27 -07:00
James R. Barlow
137a6e45f5
ghostscript: fix missing fspath for py3.5
2018-09-19 22:57:20 -07:00
James R. Barlow
29116e1dec
Change to README.md
2018-09-19 21:01:24 -07:00
James R. Barlow
87193335b9
v7.1.0 notes
2018-09-19 20:57:18 -07:00
James R. Barlow
cfd4f8a850
Improve error handling for improvements to Ghostscript text extraction
2018-09-19 20:29:18 -07:00
James R. Barlow
eaa324939f
Upgrade to pikepdf 0.3.3
...
Closes #231
2018-09-19 15:30:54 -07:00
James R. Barlow
ef70e538f7
Improve error message on handling KeyboardInterrupt
...
Closes #301
2018-09-19 01:40:26 -07:00
James R. Barlow
b7b912e56a
Fix test suite and blank pages
2018-09-17 01:12:58 -07:00
James R. Barlow
4615cf2f1e
First cut at improving text extraction speed
2018-09-16 23:34:18 -07:00
James R. Barlow
eaf772f80a
Merge v6.2.4 release notes
2018-09-16 15:45:38 -07:00
James R. Barlow
96ba75eabd
Ghostscript: fix issues in strict ASCII implementation
2018-09-16 15:41:54 -07:00
James R. Barlow
fdfe52c1ad
main: add debug option to force threads
2018-09-15 00:01:45 -07:00
James R. Barlow
932b2e2a29
main: print Ghostscript version too
2018-09-14 23:58:06 -07:00
James R. Barlow
57e489c957
main: Cleanup; support overriding sys.args in run_pipeline
2018-09-14 23:57:35 -07:00
James R. Barlow
17a3fa671c
ghostscript: API docs update
2018-09-14 23:51:52 -07:00
James R. Barlow
2659afb4f6
Cleanup gitignore
2018-09-14 21:02:22 -07:00
James R. Barlow
7392115507
Blacklist Ghostscript 9.24 due to regressions
...
As per issue #291 . Forced push to remove a copyrighted test file that was
accidentally included.
v7.0.6
2018-09-14 20:41:13 -07:00
James R. Barlow
c54d0c7eaa
v7.0.5 release notes
v7.0.5
2018-09-13 23:29:54 -07:00
James R. Barlow
b95eefc65f
Fix pikepdf version for Travis
2018-09-13 22:08:19 -07:00
James R. Barlow
686207ab7f
Check for and reject Adobe LiveCycle Designer PDFs
...
These are the ones that display a "Please wait..." message.
Closes #296
2018-09-13 21:50:51 -07:00
James R. Barlow
517b385fe5
Work around loss of Unicode DOCINFO in Ghostscript 9.24+
...
Ghostscript no longer supports UTF-16-BE-hex strings as a way of
supplying Unicode data in pdfmark so we have lost this functionality too:
http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=e997c6836d243ab37fe3a5f0d57974af95eb5eac
For users this means setting --title, --author, etc. will not work if gs
9.24 is installed, but if the file has existing metadata it might work.
For now we enforce police-state-strict ASCII, until there's time to
implement proper metadata editing. Relevant tests set to xfail.
2018-09-13 21:33:39 -07:00
James R. Barlow
795019b0c1
Work around invalid TOC entries
...
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.
We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
3127a73822
Ghostscript: no need to specify ProcessColorModel when ColorConversionStrategy
2018-09-11 11:56:05 -07:00
James R. Barlow
069ee6c91f
ghostscript: fix for 9.24 having jpeg passthrough available
2018-09-10 23:09:51 -07:00
James R. Barlow
3aac3a98ca
tests: Migrate metadata tests to pikepdf
...
For some reason PyPDF2 has begun to trigger internal errors in
pytest on macOS alone. Not sure why, but nothing is wrong that I can
see. Seemed like an opportune time to switch to pikepdf; found some
new issues in the process anyway.
2018-09-10 16:06:01 -07:00
James R. Barlow
268859a304
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF: docs
2018-09-10 11:52:04 -07:00