James R. Barlow
04a57a3cc2
OS X -> macOS
2016-11-21 20:40:06 -08:00
James R. Barlow
d0c22ce01d
v4.3.2 release notes
v4.3.2
2016-11-10 23:16:08 -08:00
James R. Barlow
23c95e9660
ghostscript: elide overprinting to fix PDF/A errors in GS 9.20
...
It looks like GS 9.19 can incorrectly set overprinting for the text layer
even though this makes no sense in PDF/A, or at least someone produced
PDFs that have this after a Tesseract PDF -> GS PDF/A conversion. GS 9.20
complains about this. Instead of aborting, elide the feature.
See
http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=094d5a1880f1cb9ed320ca9353eb69436e09b594
and
issue #107 .
It looks like it is better to elide features and warn about elision rather
than abort with an error.
2016-11-10 14:48:02 -08:00
James R. Barlow
eecab9b95d
pdfa: fix KeyError on pdfa_dict if document has some xmp metadata but
...
not exactly what we’re looking for
2016-11-09 05:41:12 -08:00
James R. Barlow
8abc2f113c
Merge branch 'develop'
v4.3.1
2016-11-07 14:36:50 -08:00
James R. Barlow
949d2ff1c2
v4.3.1 release notes
2016-11-07 14:36:08 -08:00
James R. Barlow
1c8b763d53
test_pageinfo: Remove bits per component test
...
The behavior of this test will ultimately depend on what version of
img2pdf is installed, since after my patch it will be able to produce
1bpp images.
2016-11-07 14:35:54 -08:00
James R. Barlow
bb91393b85
Fix “deskew-rotate” bug.
...
Turns out this occurred in any case where pdf-renderer hocr was used
and a tesseract timeout or error occurred. We created a replacement
page based on the unrotated page dimensions instead of the input image’s
dimensions.
2016-11-07 14:17:31 -08:00
James R. Barlow
cc9c0d819e
Add test case for documents that get rotated incorrectly after deskew
2016-11-07 14:15:03 -08:00
James R. Barlow
a72b8caf47
Update documentation on other languages, multilingual documents
2016-11-07 14:14:06 -08:00
James R. Barlow
fdd9b8b8ce
Optimize some of the test resources to reduce file sizes
...
Mostly by reducing RGB -> monochrome and applying JBIG2 compression
2016-11-07 14:01:23 -08:00
James R. Barlow
c096b4ca8c
Make debug dump of pageinfo at the end of processing readable
2016-11-04 02:23:02 -07:00
James R. Barlow
427add3008
Add @posttask debug hooks
2016-11-03 18:15:21 -07:00
James R. Barlow
c45871700d
Fix bug: LeptonicaErrorTrap() leaks file handles
2016-11-03 15:51:27 -07:00
Sean Whitton
6821e8eeb2
disable mathjax sphinx extension ( #103 )
...
Mathjax isn't actually needed for OCRmyPDF's docs, but enabling this
extension causes the browser to download a copy of mathjax.js from
cdn.mathjax.org anyway.
I have to disable this for the offline docs bundled with Debian, but
since you're not using mathjax, it would be nice to have the diff merged
upstream.
2016-11-01 21:56:57 -07:00
James R. Barlow
a4f07756a5
tesseract caching: don't transcode tesseract's output, hash source file
...
For sanity's sake, deal with tesseract streams in binary without
transcoding (via universal_newlines, etc.). The only differences are
printing messages regarding spoofing.
Also hash the source file so that changes to the cache mechanism
invalidate old cache automatically. That is probably too aggressive,
but simple and safer than the previous approach.
2016-10-28 16:44:12 -07:00
James R. Barlow
f24fb0e0c5
Obligatory MANIFEST.in repair
v4.3
2016-10-28 01:28:46 -07:00
James R. Barlow
73b88a0a6f
More work on documentation
2016-10-28 01:22:40 -07:00
James R. Barlow
c42f39e2d4
Update README to point to ReadTheDocs
2016-10-28 00:33:17 -07:00
James R. Barlow
5e5fe3175f
docs: OS X -> macOS branding change
2016-10-28 00:32:57 -07:00
James R. Barlow
cab65d1f11
pageinfo: add a python3.4 implementation of isclose()
2016-10-28 00:31:04 -07:00
James R. Barlow
245f05d5f4
docs: allow python setup.py install --force to bypass checks
...
ReadTheDocs needs this.
2016-10-28 00:07:26 -07:00
James R. Barlow
dda751f9e3
Merge branch 'feature/docs' into develop
...
# Conflicts:
# ocrmypdf/__main__.py
2016-10-27 23:50:08 -07:00
James R. Barlow
3d37ae988a
Update release notes for 4.3
2016-10-27 23:48:12 -07:00
James R. Barlow
717acd9855
Prevent dumping binary PDFs to stdout
2016-10-27 16:20:53 -07:00
James R. Barlow
2e4431cc63
Allow piping output to stdout
2016-10-27 16:14:42 -07:00
James R. Barlow
f7387b0859
test_stdin: simplify this test
...
No need to involve 'cat', just hook the file up to stdin.
2016-10-27 16:01:07 -07:00
James R. Barlow
a09f6b8977
Test cases: check that stdout is clear of output
...
To ensure piping to stdout is possible.
2016-10-27 15:58:24 -07:00
James R. Barlow
d63449c214
main: don't print output file location to stdout, use stderr
2016-10-27 15:57:33 -07:00
James R. Barlow
a86805f0d9
Remove possibly non-free page from "multipage.pdf"
2016-10-27 15:56:43 -07:00
James R. Barlow
7d2009ccef
ghostscript: log errors from stdout
2016-10-27 15:36:20 -07:00
James R. Barlow
18ae5db06d
ghostscript: ensure raster resolution is specified in integer units
2016-10-27 15:35:33 -07:00
James R. Barlow
9a1838f102
pageinfo: accept "cm/Do" image drawing without the usual "q/Q" wrapper
...
Some PDFs omit the traditional q/Q wrapper and alter ctm with a stack
depth of zero, so make our test for stack depth specifically test for
the case where the PDF calls for rendering to an uninitialized ctm.
Probably related to #97 .
2016-10-27 15:35:00 -07:00
James R. Barlow
e20346032d
leptonica: add color testing functions for future experiments
2016-10-27 14:49:49 -07:00
James R. Barlow
693a27d76c
leptonica: add iPython display hook and equality test
2016-10-26 14:44:41 -07:00
James R. Barlow
203966d86b
leptonica: fix Pillow conversion for 1-bit and 8-bit gray images
2016-10-26 13:10:13 -07:00
James R. Barlow
7eca8508fd
Implement new preprocessing feature, background removal
2016-10-14 17:23:34 -07:00
James R. Barlow
b85270df1c
Merge branch 'master' into develop
2016-10-14 15:56:58 -07:00
James R. Barlow
aff597cef4
v4.2.5: update release notes, fix silly typo in pageinfo.py
v4.2.5
2016-10-13 13:26:39 -07:00
James R. Barlow
61b05b3dee
Fix issue: BitsPerComponent is an optional field, sometimes omitted
2016-10-13 13:15:27 -07:00
Julian Kahnert
453c4ef602
Update README.rst ( #98 )
...
`brew install tesseract` just installed the english language pack not French, German or Spanish
2016-10-12 11:20:58 -07:00
James R. Barlow
cf4b04f92d
The main 'quick' test should be a file that OCRs to recognizable text
2016-10-07 16:25:34 -07:00
James R. Barlow
06c6999987
Merge commit '07891d994aab92e7a14aebe1ac509aab2d4f170c'
2016-10-07 12:45:56 -07:00
James R. Barlow
013c5a369f
Replace redacted file with an OCR-able file
2016-10-07 12:45:22 -07:00
James R. Barlow
07891d994a
Replace redacted file with an OCR-able file
2016-10-07 12:44:49 -07:00
James R. Barlow
6baf8668a6
Replace with non-free file milk.pdf with free equivalent
2016-10-06 13:10:28 -07:00
James R. Barlow
4ba2962c56
Comment on non-free files
2016-10-05 16:48:16 -07:00
James R. Barlow
7ad92f5db4
Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF
2016-10-05 16:39:00 -07:00
James R. Barlow
4dad09cc91
resources/README: replace the other large table with a list table
2016-10-05 16:38:51 -07:00
Sean Whitton
7b2e0c7a7a
also exclude .git in pytest.ini ( #94 )
2016-09-15 08:56:14 -07:00