1367 Commits

Author SHA1 Message Date
Tucker Barbour
9fd9c7a51f
Scale BoundingBox and Text elements to account for additional space.
Here we are manually scaling the pt width used for the BoundingBox and
the Text element when manually adding whitespace to account for
limitations of the PDF.js viewer. This fixes an initial regression
noticed when selecting text elements in Chrome and PDFium. The width
of the Text element and BoundBox had not been adjusted for the
additional whitespace so the highlighting was offset slightly.
2018-03-02 11:18:47 +00:00
Charles Forcey
422e619978 Add a note to the documentation about interword-spaces 2018-03-01 13:15:03 -05:00
Tucker Barbour
e6e34251c6 Add option to explicitly add interword spaces to HOCR pdf-renderer
This commit includes an optional work around for limitations of the
PDF.js viewer described in
https://github.com/jbarlow83/OCRmyPDF/issues/133. Here is explicitly
add an addition space to text elements before drawing them on the PDF
canvas when using the HOCR renderer. This option does not apply to
other pdf renderers in OCRmyPDF and is turned off by default.
2018-03-01 13:15:03 -05:00
James R. Barlow
2d8aad1086 Improve docs 2018-03-01 00:24:38 -08:00
James R. Barlow
74ca736333 Issue #223: improve text of encrypted PDF error message 2018-02-27 15:08:22 -08:00
James R. Barlow
5e4fd8b0b9 compile_leptonica should have shebang 2018-02-24 12:49:36 -08:00
James R. Barlow
8ab8132411 lint: unused variables, wildcard imports 2018-02-24 12:48:52 -08:00
James R. Barlow
6899dd46e4 lint: remove extraneous backslash 2018-02-24 12:42:37 -08:00
James R. Barlow
8ad0697a20 lint: remove duplicate property definition 2018-02-24 12:42:03 -08:00
James R. Barlow
b47e5672e6 Remove old test case that no longer works 2018-02-24 12:40:14 -08:00
James R. Barlow
45c7bd9a60 lint: Remove shebangs from non-executable files 2018-02-24 12:38:58 -08:00
James R. Barlow
e7bcb95635 Fix pylint errors 2018-02-24 11:59:01 -08:00
James R. Barlow
3de83627a9 Handle output to /dev/null or directory (#219)
Previously we threw an exception if the output name was a directory (only after doing OCR) and would trigger a PermissionError on trying to flip permission bits of /dev/null due to shutil.copyfile implementation. Instead of copying file use shutil.copyfileobj which should also respect umask etc.
2018-02-19 22:15:07 -08:00
James R. Barlow
f248576994 Change instructions to point away from private tap 2018-02-19 17:33:58 -08:00
James R. Barlow
aac5b6de3b Update autobrew script to match final changes 2018-02-17 00:12:03 -08:00
James R. Barlow
24435f11e0 We are now in homebrew 2018-02-15 17:42:16 -08:00
James R. Barlow
a9da839c39 Add vector-only PDF test case v5.6.0 2018-02-08 00:17:35 -08:00
James R. Barlow
fa2c0296d6 v5.6.0 release notes, docs 2018-02-07 16:48:04 -08:00
James R. Barlow
1dfc32d7e6 Preserve "text as curves" vector content
Never updated the checking logic to deal with a pure vector file with no text that needs an OCR layer. This is doable, so allow it.
2018-02-07 16:05:48 -08:00
James R. Barlow
4a61beae41 autobrew: use homebrew's built-in test fixture 2018-02-05 11:10:33 -08:00
James R. Barlow
bd30587bf1 Update depends_on order 2018-01-29 13:16:07 -08:00
James R. Barlow
e0070e3e18 Update Dockerfile to use Ubuntu 17.10 (issue #214) 2018-01-28 15:30:41 -08:00
James R. Barlow
019513696b Ghostscript spoof scripts did not report their --version correctly v5.5 2018-01-10 17:08:14 -08:00
James R. Barlow
ad7a4476db hugemono.pdf needs --max-image-mpixels to pass with Pillow 5.0 2018-01-10 16:55:18 -08:00
James R. Barlow
4812b20fb2 Fix tesseract_noop.py generating wrong size of output PDF in tests
This caused trouble before with test_deskew
2018-01-10 16:35:31 -08:00
James R. Barlow
f5e07c9427 Fix Ghostscript parameter order 2018-01-10 16:33:26 -08:00
James R. Barlow
75dcb90621 Niceties: when environment variable overload is used clarify we're not checking the PATH 2018-01-10 16:33:03 -08:00
James R. Barlow
dfc0434cc2 Update requirements to set Pillow to 5.0 2018-01-10 15:45:23 -08:00
James R. Barlow
882fc2257c Add --max-image-mpixels argument to support Pillow 5.0 2018-01-10 15:43:59 -08:00
James R. Barlow
41e83b52fc Document process for redoing OCR 2018-01-10 15:39:58 -08:00
James R. Barlow
47758b4d8f Reactivate autobrew 2018-01-10 15:39:36 -08:00
James R. Barlow
6bf1f970a0 Fix some parameter validation for --output-type pdfa-1 and pdfa-2 2018-01-10 11:50:08 -08:00
James R. Barlow
7d451f101f Detect old versions of Ghostscript and warn about them (#208) 2018-01-10 11:47:39 -08:00
James R. Barlow
7edbfe0e40 Update autobrew template 2018-01-09 12:36:14 -08:00
James R. Barlow
91b42cbfa8 Fix issue in sandwich renderer when skipping OCR on a rotated and deskewed page
If OCR is skipped due to --tesseract-timeout or similar, and the skip page is rotated with /Rotate, and the skip page was deskewed or had other image processing, then the skip page was created with the wrong dimensions causing the output page to be cropped.
2018-01-09 00:17:53 -08:00
James R. Barlow
6907df41b4 Disable autobrew until homebrew accepts the official release 2018-01-08 23:26:58 -08:00
James R. Barlow
2cebd90cbd Fix brew audit --strict warnings 2017-12-09 12:14:28 -08:00
James R. Barlow
376a121aaa Re-enable macos v5.4.4 2017-11-29 15:06:19 -08:00
James R. Barlow
da11fd17ee qpdf dummy: needs to return version now 2017-11-29 14:35:37 -08:00
James R. Barlow
a40689a0ff tesseract: handle return of bytes properly in error cases 2017-11-29 14:35:26 -08:00
James R. Barlow
44a45fc3fb Add "bad UTF8 output from Tesseract" test 2017-11-29 14:08:07 -08:00
James R. Barlow
ec4bb5359a Read tesseract's output as binary to avoid UnicodeDecodeErrors if it messes up 2017-11-29 13:44:40 -08:00
James R. Barlow
d2217632df Rename _verify_python3_env 2017-11-29 13:43:18 -08:00
James R. Barlow
2cc044feed Move qpdf complaint to after options checking so that it won't break ocrmypdf --version 2017-11-29 13:42:55 -08:00
James R. Barlow
64fd0cb54f Remove test_qpdf.py only from travis 2017-11-29 12:54:48 -08:00
James R. Barlow
c5a1d22e81 That fixed it. Complain about old versions of qpdf now 2017-11-29 12:53:34 -08:00
James R. Barlow
a7b307af04 Looks like issue was negzero.pdf with qpdf 5.1.1 on travis, which is why osx passes
Reorganize and see if this is better now
2017-11-29 12:47:09 -08:00
James R. Barlow
3269eba16c Is it negzero.pdf? 2017-11-29 12:02:37 -08:00
James R. Barlow
d472860e3b Try to diagnose travis-only failure of qpdf test 2017-11-29 11:41:06 -08:00
James R. Barlow
731c9ea55e Set timeouts on the tests that seem to be stalling on travis (but not elsewhere) 2017-11-27 14:46:10 -08:00