2895 Commits

Author SHA1 Message Date
James R. Barlow
6d3f9ff15a
api: rework ocr() slightly to simplify variable handling 2020-11-03 17:10:52 -08:00
James R. Barlow
5d1d1a712b
docs: more details about macOS API changes
Due to fork->spawn
2020-11-03 17:09:58 -08:00
James R. Barlow
6d5f8133e0
docs: show ifmain guard in example 2020-11-03 15:28:33 -08:00
James R. Barlow
13018d3d5c
ci: Extend test matrix to Python 3.9 2020-11-03 04:15:14 -08:00
James R. Barlow
14a85f9473
Fix pinned dependencies v11.3.2 2020-11-03 04:12:47 -08:00
James R. Barlow
d22a1b3367
v11.3.2 release notes (2)
Since we never tagged it, fix other things.
2020-11-03 02:03:25 -08:00
James R. Barlow
b913e5dfef
ghostscript: don't repeat log in debug
Subprocess already does this for us.
2020-11-03 01:45:06 -08:00
James R. Barlow
dd8a5a4c72
Fix log domain names
ocrmypdf.subprocess.subprocess.ghostscript -> ocrmypdf.subprocess.ghostscript
2020-11-03 01:44:35 -08:00
James R. Barlow
36e9a54f02
Remove extraneous page rotation
This was added in commit b5ccbfd but seems to have been ill-advised.
2020-11-03 01:34:28 -08:00
James R. Barlow
3707af3b74
Change pdf.root to pdf.Root 2020-11-03 01:30:31 -08:00
James R. Barlow
ced7ad9164
unpaper: round off DPI 2020-11-03 01:14:57 -08:00
James R. Barlow
54bbbfdeb3
Fix UnboundLocalError when considering ImageMasks for optimization
Uncovered by test file in issue 667, although unrelated to that issue.
2020-11-03 01:08:14 -08:00
James R. Barlow
7f73a6ed1e
Some Python 3.9 fixes 2020-11-03 00:45:47 -08:00
James R. Barlow
dce206d3dc
Fix pre-commit for Py3.9 2020-11-03 00:20:25 -08:00
James R. Barlow
9304c856cf Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-11-02 02:47:36 -08:00
James R. Barlow
e5df98cbdf
v11.3.2 release notes 2020-11-02 02:43:32 -08:00
James R. Barlow
19bf3aeb00
api: improve typing 2020-11-02 02:33:34 -08:00
James R. Barlow
e86be0031c
unpaper: fix process output handling
With the ocrmypdf.subprocess wrapper, logging the output here
is redundant and loses the page number context.
2020-11-02 01:07:41 -08:00
James R. Barlow
6425977998
unpaper: use pnm instead of png
Some users reported problems with PNG recently; try PNM.

Fixes #665
Fixes #667
2020-11-02 01:05:56 -08:00
James R. Barlow
d57df2d980
subprocess: support programs that write their messages to stdout 2020-11-02 01:00:59 -08:00
James R. Barlow
664d0c7969
Document configure_debug_logging 2020-11-02 00:59:00 -08:00
James R. Barlow
a354663ee1
Fix typo in API documentation 2020-11-02 00:58:28 -08:00
Graham Miln
b21b048ec4
Add macOS brew language support (#615)
Note `brew` command for installing additional languages on macOS.
2020-10-30 01:09:06 -07:00
James R. Barlow
709c65b41a
v11.3.1 release notes v11.3.1 2020-10-27 23:11:11 -07:00
James R. Barlow
67f99c5bb7 Endorse pdfminer.six 20201018 2020-10-27 23:09:45 -07:00
James R. Barlow
d55e673d9c Fix warning about --pdfa-image-compression argument at wrong times
Closes #663
2020-10-27 23:09:45 -07:00
James R. Barlow
21b90d2d14 Endorse pikepdf 2.x 2020-10-27 23:09:45 -07:00
Edward Betts
2def7e3392
Use % for percentage in string format (#643) 2020-10-27 23:09:14 -07:00
James R. Barlow
b0dcaa7512
v11.3.0 release notes v11.3.0 2020-10-24 03:19:32 -07:00
James R. Barlow
e8285b1d10
Add test to confirm rasterize_pdf_page rotates correct 2020-10-24 03:10:59 -07:00
James R. Barlow
5ba56adb53
Fix page rotation issue (again)
Commit 1327ab3 introduced a fix for a regression, which was reported
in #581, #634. It appears that the actual cause of this issue was
default parameters to rasterize_pdf_page in pluggy not working as
expected, causing a default rotation=0 even when a rotation was needed.
As such the OCR image was generated with the wrong orientation,
causing the initial regression and fix in commit 1327ab3.

Now that the real problem is identified, it's apparent that the logic
prior to 1327ab3 was found and we can revert to 1327ab3 since it fixes
all known cases including #658.

This reverts 1327ab3 except for retaining improves to rotation output.
2020-10-24 02:45:21 -07:00
James R. Barlow
ca735278e0
setup: Version pluggy better 2020-10-24 02:35:41 -07:00
James R. Barlow
b5ccbfdf25
Fix hookspec of rasterize_pdf_page to remove default parameters 2020-10-24 02:35:18 -07:00
James R. Barlow
8c35d6e6e4
Fix debug log messages being suppressed from child processes 2020-10-22 02:20:06 -07:00
James R. Barlow
d1e0c81eda Ensure worker_pdf is closed after gathering info in a thread
This is hacky, uses global state, but it does improve the situation for now.
2020-10-22 00:38:24 -07:00
James R. Barlow
10c8e4f8b4 Only create debug.log when running from command line
When used as a library ocrmypdf shouldn't make policy decisions, like where to
put a log file. Unsurprisingly, creating it causes problems for library users
because we deleted the temporary folder which held the log file and made no
effort to move it to a new location.

Also update the documentation to better described how an application should
handle this.

Closes #657
2020-10-20 01:29:36 -07:00
James R. Barlow
6be2242c21
Describe "OCR" step as "Image processing" when --tesseract-timeout=0
Fixes #647
2020-10-08 01:03:42 -07:00
James R. Barlow
204c9d6ae1
Fix inverted colors during JBIG2 optimization on paletted images
Fixes #640
v11.2.1
2020-10-07 04:08:50 -07:00
James R. Barlow
6eb393590b
v11.2.0 release notes
Change v11.1.3 to v11.2.0 since it contains functional changes.
v11.2.0
2020-10-06 03:24:31 -07:00
James R. Barlow
07c6654057
v11.1.3 release notes 2020-10-06 03:22:48 -07:00
James R. Barlow
4e15eb8d14
Fix image optimization discarding image masks and soft masks associated with PNGs
Fixes #648
2020-10-06 03:20:54 -07:00
James R. Barlow
8b01ab8ad2
Better type checking on ocrmypdf.ocr(plugins=...) 2020-10-05 15:02:34 -07:00
James R. Barlow
e0a522ad50
Document the example plugin 2020-10-05 15:01:44 -07:00
James R. Barlow
a1a8788c5a Merge branch 'master' of github.com:jbarlow83/OCRmyPDF v11.1.2 2020-09-29 02:46:27 -07:00
James R. Barlow
cccdc178c3
v11.1.2 release notes 2020-09-29 02:46:18 -07:00
James R. Barlow
4eacb3454f
hOCR: write text in correct order
Fixes #642
2020-09-29 02:45:11 -07:00
Jimit Dholakia
82b8b41e80
docs: Add 'unpaper' optional dependency for Ubuntu 18.04 (#639) 2020-09-25 11:54:31 -07:00
James R. Barlow
581c5020ab
v11.1.1 release notes v11.1.1 2020-09-25 00:28:38 -07:00
James R. Barlow
3ef8872a1e pngquant driver: refactor, use streams instead of temporary files 2020-09-25 00:18:02 -07:00
James R. Barlow
28eec73eed Tighten unpaper-args validation to exclude . and ..
Just in case
2020-09-25 00:18:02 -07:00