2676 Commits

Author SHA1 Message Date
James R. Barlow
4046766ca5 Fix Python 3.5 test suite failure on symlinks
Did not account for API difference in pathlib
2018-03-02 16:57:46 -08:00
James R. Barlow
810390df0b pipeline: Refactor duplicate with clause 2018-03-02 16:57:15 -08:00
James R. Barlow
de749bc7ae Fix regression - output to stdout broken 2018-03-02 15:35:51 -08:00
James R. Barlow
9965b8800c Move Dockerfiles out of the way 2018-03-02 15:26:41 -08:00
James R. Barlow
ab870fddd6 Dockerignore: glob supported now 2018-03-02 15:24:42 -08:00
James R. Barlow
a79d6807cf Dockerfiles: remove deprecated MAINTAINER tag 2018-03-02 15:24:28 -08:00
James R. Barlow
8185fb7e43 Migrate Travis CI setup to Brewfile 2018-03-02 15:16:05 -08:00
Tucker Barbour
4b10929b25 Fix Homebrew python package (#227)
Homebrew removed python3 and python now defaults to version 3. Here we
use `brew upgrade python` to upgrade the pre-installed version of
python to python3.
2018-03-02 14:59:23 -08:00
Tucker Barbour
f6c70312c9
Fix Homebrew python package
Homebrew removed python3 and python now defaults to version 3. Here we
use `brew upgrade python` to upgrade the pre-installed version of
python to python3.
2018-03-02 14:26:13 +00:00
Tucker Barbour
9fd9c7a51f
Scale BoundingBox and Text elements to account for additional space.
Here we are manually scaling the pt width used for the BoundingBox and
the Text element when manually adding whitespace to account for
limitations of the PDF.js viewer. This fixes an initial regression
noticed when selecting text elements in Chrome and PDFium. The width
of the Text element and BoundBox had not been adjusted for the
additional whitespace so the highlighting was offset slightly.
2018-03-02 11:18:47 +00:00
Charles Forcey
422e619978 Add a note to the documentation about interword-spaces 2018-03-01 13:15:03 -05:00
Tucker Barbour
e6e34251c6 Add option to explicitly add interword spaces to HOCR pdf-renderer
This commit includes an optional work around for limitations of the
PDF.js viewer described in
https://github.com/jbarlow83/OCRmyPDF/issues/133. Here is explicitly
add an addition space to text elements before drawing them on the PDF
canvas when using the HOCR renderer. This option does not apply to
other pdf renderers in OCRmyPDF and is turned off by default.
2018-03-01 13:15:03 -05:00
James R. Barlow
2d8aad1086 Improve docs 2018-03-01 00:24:38 -08:00
James R. Barlow
74ca736333 Issue #223: improve text of encrypted PDF error message 2018-02-27 15:08:22 -08:00
James R. Barlow
5e4fd8b0b9 compile_leptonica should have shebang 2018-02-24 12:49:36 -08:00
James R. Barlow
8ab8132411 lint: unused variables, wildcard imports 2018-02-24 12:48:52 -08:00
James R. Barlow
6899dd46e4 lint: remove extraneous backslash 2018-02-24 12:42:37 -08:00
James R. Barlow
8ad0697a20 lint: remove duplicate property definition 2018-02-24 12:42:03 -08:00
James R. Barlow
b47e5672e6 Remove old test case that no longer works 2018-02-24 12:40:14 -08:00
James R. Barlow
45c7bd9a60 lint: Remove shebangs from non-executable files 2018-02-24 12:38:58 -08:00
James R. Barlow
e7bcb95635 Fix pylint errors 2018-02-24 11:59:01 -08:00
James R. Barlow
3de83627a9 Handle output to /dev/null or directory (#219)
Previously we threw an exception if the output name was a directory (only after doing OCR) and would trigger a PermissionError on trying to flip permission bits of /dev/null due to shutil.copyfile implementation. Instead of copying file use shutil.copyfileobj which should also respect umask etc.
2018-02-19 22:15:07 -08:00
James R. Barlow
f248576994 Change instructions to point away from private tap 2018-02-19 17:33:58 -08:00
James R. Barlow
aac5b6de3b Update autobrew script to match final changes 2018-02-17 00:12:03 -08:00
James R. Barlow
24435f11e0 We are now in homebrew 2018-02-15 17:42:16 -08:00
James R. Barlow
a9da839c39 Add vector-only PDF test case v5.6.0 2018-02-08 00:17:35 -08:00
James R. Barlow
fa2c0296d6 v5.6.0 release notes, docs 2018-02-07 16:48:04 -08:00
James R. Barlow
1dfc32d7e6 Preserve "text as curves" vector content
Never updated the checking logic to deal with a pure vector file with no text that needs an OCR layer. This is doable, so allow it.
2018-02-07 16:05:48 -08:00
James R. Barlow
4a61beae41 autobrew: use homebrew's built-in test fixture 2018-02-05 11:10:33 -08:00
James R. Barlow
bd30587bf1 Update depends_on order 2018-01-29 13:16:07 -08:00
James R. Barlow
e0070e3e18 Update Dockerfile to use Ubuntu 17.10 (issue #214) 2018-01-28 15:30:41 -08:00
James R. Barlow
019513696b Ghostscript spoof scripts did not report their --version correctly v5.5 2018-01-10 17:08:14 -08:00
James R. Barlow
ad7a4476db hugemono.pdf needs --max-image-mpixels to pass with Pillow 5.0 2018-01-10 16:55:18 -08:00
James R. Barlow
4812b20fb2 Fix tesseract_noop.py generating wrong size of output PDF in tests
This caused trouble before with test_deskew
2018-01-10 16:35:31 -08:00
James R. Barlow
f5e07c9427 Fix Ghostscript parameter order 2018-01-10 16:33:26 -08:00
James R. Barlow
75dcb90621 Niceties: when environment variable overload is used clarify we're not checking the PATH 2018-01-10 16:33:03 -08:00
James R. Barlow
dfc0434cc2 Update requirements to set Pillow to 5.0 2018-01-10 15:45:23 -08:00
James R. Barlow
882fc2257c Add --max-image-mpixels argument to support Pillow 5.0 2018-01-10 15:43:59 -08:00
James R. Barlow
41e83b52fc Document process for redoing OCR 2018-01-10 15:39:58 -08:00
James R. Barlow
47758b4d8f Reactivate autobrew 2018-01-10 15:39:36 -08:00
James R. Barlow
6bf1f970a0 Fix some parameter validation for --output-type pdfa-1 and pdfa-2 2018-01-10 11:50:08 -08:00
James R. Barlow
7d451f101f Detect old versions of Ghostscript and warn about them (#208) 2018-01-10 11:47:39 -08:00
James R. Barlow
7edbfe0e40 Update autobrew template 2018-01-09 12:36:14 -08:00
James R. Barlow
91b42cbfa8 Fix issue in sandwich renderer when skipping OCR on a rotated and deskewed page
If OCR is skipped due to --tesseract-timeout or similar, and the skip page is rotated with /Rotate, and the skip page was deskewed or had other image processing, then the skip page was created with the wrong dimensions causing the output page to be cropped.
2018-01-09 00:17:53 -08:00
James R. Barlow
6907df41b4 Disable autobrew until homebrew accepts the official release 2018-01-08 23:26:58 -08:00
James R. Barlow
2cebd90cbd Fix brew audit --strict warnings 2017-12-09 12:14:28 -08:00
James R. Barlow
376a121aaa Re-enable macos v5.4.4 2017-11-29 15:06:19 -08:00
James R. Barlow
da11fd17ee qpdf dummy: needs to return version now 2017-11-29 14:35:37 -08:00
James R. Barlow
a40689a0ff tesseract: handle return of bytes properly in error cases 2017-11-29 14:35:26 -08:00
James R. Barlow
44a45fc3fb Add "bad UTF8 output from Tesseract" test 2017-11-29 14:08:07 -08:00