James R. Barlow
4046766ca5
Fix Python 3.5 test suite failure on symlinks
...
Did not account for API difference in pathlib
2018-03-02 16:57:46 -08:00
James R. Barlow
810390df0b
pipeline: Refactor duplicate with clause
2018-03-02 16:57:15 -08:00
James R. Barlow
de749bc7ae
Fix regression - output to stdout broken
2018-03-02 15:35:51 -08:00
James R. Barlow
9965b8800c
Move Dockerfiles out of the way
2018-03-02 15:26:41 -08:00
James R. Barlow
ab870fddd6
Dockerignore: glob supported now
2018-03-02 15:24:42 -08:00
James R. Barlow
a79d6807cf
Dockerfiles: remove deprecated MAINTAINER tag
2018-03-02 15:24:28 -08:00
James R. Barlow
8185fb7e43
Migrate Travis CI setup to Brewfile
2018-03-02 15:16:05 -08:00
Tucker Barbour
4b10929b25
Fix Homebrew python package ( #227 )
...
Homebrew removed python3 and python now defaults to version 3. Here we
use `brew upgrade python` to upgrade the pre-installed version of
python to python3.
2018-03-02 14:59:23 -08:00
Tucker Barbour
f6c70312c9
Fix Homebrew python package
...
Homebrew removed python3 and python now defaults to version 3. Here we
use `brew upgrade python` to upgrade the pre-installed version of
python to python3.
2018-03-02 14:26:13 +00:00
Tucker Barbour
9fd9c7a51f
Scale BoundingBox and Text elements to account for additional space.
...
Here we are manually scaling the pt width used for the BoundingBox and
the Text element when manually adding whitespace to account for
limitations of the PDF.js viewer. This fixes an initial regression
noticed when selecting text elements in Chrome and PDFium. The width
of the Text element and BoundBox had not been adjusted for the
additional whitespace so the highlighting was offset slightly.
2018-03-02 11:18:47 +00:00
Charles Forcey
422e619978
Add a note to the documentation about interword-spaces
2018-03-01 13:15:03 -05:00
Tucker Barbour
e6e34251c6
Add option to explicitly add interword spaces to HOCR pdf-renderer
...
This commit includes an optional work around for limitations of the
PDF.js viewer described in
https://github.com/jbarlow83/OCRmyPDF/issues/133 . Here is explicitly
add an addition space to text elements before drawing them on the PDF
canvas when using the HOCR renderer. This option does not apply to
other pdf renderers in OCRmyPDF and is turned off by default.
2018-03-01 13:15:03 -05:00
James R. Barlow
2d8aad1086
Improve docs
2018-03-01 00:24:38 -08:00
James R. Barlow
74ca736333
Issue #223 : improve text of encrypted PDF error message
2018-02-27 15:08:22 -08:00
James R. Barlow
5e4fd8b0b9
compile_leptonica should have shebang
2018-02-24 12:49:36 -08:00
James R. Barlow
8ab8132411
lint: unused variables, wildcard imports
2018-02-24 12:48:52 -08:00
James R. Barlow
6899dd46e4
lint: remove extraneous backslash
2018-02-24 12:42:37 -08:00
James R. Barlow
8ad0697a20
lint: remove duplicate property definition
2018-02-24 12:42:03 -08:00
James R. Barlow
b47e5672e6
Remove old test case that no longer works
2018-02-24 12:40:14 -08:00
James R. Barlow
45c7bd9a60
lint: Remove shebangs from non-executable files
2018-02-24 12:38:58 -08:00
James R. Barlow
e7bcb95635
Fix pylint errors
2018-02-24 11:59:01 -08:00
James R. Barlow
3de83627a9
Handle output to /dev/null or directory ( #219 )
...
Previously we threw an exception if the output name was a directory (only after doing OCR) and would trigger a PermissionError on trying to flip permission bits of /dev/null due to shutil.copyfile implementation. Instead of copying file use shutil.copyfileobj which should also respect umask etc.
2018-02-19 22:15:07 -08:00
James R. Barlow
f248576994
Change instructions to point away from private tap
2018-02-19 17:33:58 -08:00
James R. Barlow
aac5b6de3b
Update autobrew script to match final changes
2018-02-17 00:12:03 -08:00
James R. Barlow
24435f11e0
We are now in homebrew
2018-02-15 17:42:16 -08:00
James R. Barlow
a9da839c39
Add vector-only PDF test case
v5.6.0
2018-02-08 00:17:35 -08:00
James R. Barlow
fa2c0296d6
v5.6.0 release notes, docs
2018-02-07 16:48:04 -08:00
James R. Barlow
1dfc32d7e6
Preserve "text as curves" vector content
...
Never updated the checking logic to deal with a pure vector file with no text that needs an OCR layer. This is doable, so allow it.
2018-02-07 16:05:48 -08:00
James R. Barlow
4a61beae41
autobrew: use homebrew's built-in test fixture
2018-02-05 11:10:33 -08:00
James R. Barlow
bd30587bf1
Update depends_on order
2018-01-29 13:16:07 -08:00
James R. Barlow
e0070e3e18
Update Dockerfile to use Ubuntu 17.10 (issue #214 )
2018-01-28 15:30:41 -08:00
James R. Barlow
019513696b
Ghostscript spoof scripts did not report their --version correctly
v5.5
2018-01-10 17:08:14 -08:00
James R. Barlow
ad7a4476db
hugemono.pdf needs --max-image-mpixels to pass with Pillow 5.0
2018-01-10 16:55:18 -08:00
James R. Barlow
4812b20fb2
Fix tesseract_noop.py generating wrong size of output PDF in tests
...
This caused trouble before with test_deskew
2018-01-10 16:35:31 -08:00
James R. Barlow
f5e07c9427
Fix Ghostscript parameter order
2018-01-10 16:33:26 -08:00
James R. Barlow
75dcb90621
Niceties: when environment variable overload is used clarify we're not checking the PATH
2018-01-10 16:33:03 -08:00
James R. Barlow
dfc0434cc2
Update requirements to set Pillow to 5.0
2018-01-10 15:45:23 -08:00
James R. Barlow
882fc2257c
Add --max-image-mpixels argument to support Pillow 5.0
2018-01-10 15:43:59 -08:00
James R. Barlow
41e83b52fc
Document process for redoing OCR
2018-01-10 15:39:58 -08:00
James R. Barlow
47758b4d8f
Reactivate autobrew
2018-01-10 15:39:36 -08:00
James R. Barlow
6bf1f970a0
Fix some parameter validation for --output-type pdfa-1 and pdfa-2
2018-01-10 11:50:08 -08:00
James R. Barlow
7d451f101f
Detect old versions of Ghostscript and warn about them ( #208 )
2018-01-10 11:47:39 -08:00
James R. Barlow
7edbfe0e40
Update autobrew template
2018-01-09 12:36:14 -08:00
James R. Barlow
91b42cbfa8
Fix issue in sandwich renderer when skipping OCR on a rotated and deskewed page
...
If OCR is skipped due to --tesseract-timeout or similar, and the skip page is rotated with /Rotate, and the skip page was deskewed or had other image processing, then the skip page was created with the wrong dimensions causing the output page to be cropped.
2018-01-09 00:17:53 -08:00
James R. Barlow
6907df41b4
Disable autobrew until homebrew accepts the official release
2018-01-08 23:26:58 -08:00
James R. Barlow
2cebd90cbd
Fix brew audit --strict warnings
2017-12-09 12:14:28 -08:00
James R. Barlow
376a121aaa
Re-enable macos
v5.4.4
2017-11-29 15:06:19 -08:00
James R. Barlow
da11fd17ee
qpdf dummy: needs to return version now
2017-11-29 14:35:37 -08:00
James R. Barlow
a40689a0ff
tesseract: handle return of bytes properly in error cases
2017-11-29 14:35:26 -08:00
James R. Barlow
44a45fc3fb
Add "bad UTF8 output from Tesseract" test
2017-11-29 14:08:07 -08:00