3031 Commits

Author SHA1 Message Date
James R. Barlow
ee23976858 Re-sequence plugin installation 2021-01-24 23:42:28 -08:00
James R. Barlow
f559316881 Insert setuptools plugins with ocrmypdf prefix 2021-01-24 23:42:09 -08:00
James R. Barlow
084610c242
Automate insertion of builtin modules 2021-01-24 23:41:50 -08:00
James R. Barlow
9ff627472b Update pre-commit 2021-01-24 23:40:59 -08:00
James R. Barlow
956310d1ec
Import PageContext, PdfContext since they are referenced in pluginspec 2021-01-24 02:04:47 -08:00
James R. Barlow
1a982da442 tests: confirm that we produce pdf when optimization is off 2021-01-24 01:54:25 -08:00
James R. Barlow
4879a1f0de
docs: no MS Store Python 2021-01-18 13:27:31 -08:00
James R. Barlow
ce66bcc9c8 github: Ask how ocrmypdf was installed 2021-01-10 02:35:24 -08:00
James R. Barlow
1ebf3144af v11.5.0 release notes v11.5.0 2021-01-09 16:48:40 -08:00
James R. Barlow
7a1cccbc4e Fallback to LeptonicaErrorTrap_Redirect if ffi.callback fails
Might fix issue #709, Apple silicon support.
2021-01-09 16:47:11 -08:00
James R. Barlow
ebacff1b39
tests: Fix debug logging test 2021-01-09 16:41:57 -08:00
James R. Barlow
c7c447be66
Add test for configure_debug_logging
Since we can't directly test it
2021-01-09 16:02:12 -08:00
James R. Barlow
91aa175602
Consider text when determining page raster DPI
Previously if we found vectors of any sort on a page, we would bump
the DPI up to 400. We did nothing
about pages with text. As a result,
pages with a low image resolution
and printable text would have the text downgraded to image
resolution when --force-ocr was used.

We don't try to determine if the text is visible or invisible OCR text, since
that is a slower test. --redo-ocr would improve such cases anyway.
2021-01-09 16:01:49 -08:00
James R. Barlow
b267494e4a
Create raster PDF pages to match input page size
Previously we produced a raster image, then multiplied image width
by DPI to get the page size. However if there is rounding the
page size may not match exactly. In this modified approach we
constrain the page size to match.
2021-01-08 15:10:43 -08:00
James R. Barlow
f687180ecc
tests: tidy pdfinfo 2021-01-08 15:04:52 -08:00
James R. Barlow
6f4b38b103
ghostscript: tidy comments 2021-01-08 00:41:03 -08:00
James R. Barlow
d32324859c
v11.4.5 release notes v11.4.5 2021-01-06 11:42:28 -08:00
James R. Barlow
48222b87b5 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2021-01-06 03:59:40 -08:00
Jonas Winkler
62e5edc72b
fix unclosed file warnings. (#710)
Co-authored-by: Jonas Winkler <jonas.winkler@jpwinkler.de>
2021-01-06 03:59:28 -08:00
James R. Barlow
2846d46bb8
Remove .coveragerc and fold into setup.cfg 2021-01-06 03:58:18 -08:00
James R. Barlow
47ef1914d4
v11.4.4 release notes v11.4.4 2021-01-01 01:39:24 -08:00
James R. Barlow
df157552f3
Make ocrmypdf.ocr take a threading lock 2021-01-01 01:37:09 -08:00
James R. Barlow
0b3a526049
Partial fix crash on 'userunit' None (#700)
Our method of getting data from pdfminer would silently consume a StopIteration
if pdfminer returned no processed pages, leading to odd error message.

We improve an error from pdfminer properly, and returning a more
descriptive error of our own.

It would be possible for ocrmypdf to repair the file before sending it to
pdfminer, but this seems to be rare enough that we won't do that yet.
2021-01-01 01:11:32 -08:00
James R. Barlow
1e80d412fa
tesseract: fix typing of some optional arguments 2021-01-01 00:46:00 -08:00
James R. Barlow
df6e106203
concurrent: simplify results loop 2021-01-01 00:44:46 -08:00
James R. Barlow
bd0f005861
tests: tag tests that need pngquant, jbig2enc v11.4.3 2020-12-30 01:58:57 -08:00
James R. Barlow
6ba4b7b3f3
ci: temporarily disable pngquant on Windows
Looks like a packaging error, choco complains of bad hashes.
2020-12-30 01:40:56 -08:00
James R. Barlow
2c11349ee8 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-12-29 21:40:46 -08:00
James R. Barlow
b0afef09ef
v11.4.3 release notes 2020-12-29 21:40:35 -08:00
James R. Barlow
72fa347c38
tests: skip metadata test for two pikepdf versions that warn incorrectly 2020-12-29 01:47:52 -08:00
James R. Barlow
96d68c2413
pipeline: refactor metadata_fixup 2020-12-29 01:47:32 -08:00
James R. Barlow
babc76fa74 tests: assert that most patched functions are called
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
Tim Gates
dc06990e5d
docs: fix simple typo, instsalled -> installed (#704)
There is a small typo in docs/installation.rst.

Should read `installed` rather than `instsalled`.
2020-12-28 15:28:34 -08:00
James R. Barlow
0ff0d2f8d1
Remove PDF/A overprint debug message
Since we currently log all of a process's output at debug it's
redundant to log this separate message.
2020-12-27 16:19:05 -08:00
James R. Barlow
81602cf420
Fix test not patching properly after Ghostscript polling change 2020-12-27 16:01:50 -08:00
James R. Barlow
607e2d7e81
v11.4.2 release notes v11.4.2 2020-12-27 03:29:35 -08:00
James R. Barlow
b01d9e07e8
Deal with missing pthread_sigmask on Cygwin
Closes #701
2020-12-27 02:24:00 -08:00
James R. Barlow
91db94cf2e
watcher: fix OCR_LOGLEVEL env var not processed
Closes #702
2020-12-27 02:02:44 -08:00
James R. Barlow
416df803d4
pdfinfo: stricter typing 2020-12-24 22:39:00 -08:00
James R. Barlow
037b96ca16
pdfinfo: refactor to eliminate RawPageInfo 2020-12-24 02:57:44 -08:00
James R. Barlow
bb258fc99c
pdfinfo: Refactor pageinfo dictionary into a class 2020-12-24 01:47:53 -08:00
James R. Barlow
4b8ccbe8cb v11.4.1 release notes v11.4.1 2020-12-22 01:41:15 -08:00
James R. Barlow
ab1ff3331b
misc: synology fix
Accept user-contributed fix. Not testable.

Close #690.
2020-12-22 01:38:41 -08:00
James R. Barlow
3675ae918c
Fix certain invalid page ranges causing exception
Closes #686
2020-12-22 01:22:14 -08:00
James R. Barlow
0ba32b96b7 Revert "v11.4.0 release notes - remove change not actually implemented"
This reverts commit ad202693b3dcf905e180a665a54f349d00d8dfba.
Temporary folder prefix was actually changed in commit f11bb53e.
2020-12-22 00:47:25 -08:00
James R. Barlow
add64e4fa2 docs: com.github.ocrmypdf -> ocrmypdf.io 2020-12-22 00:46:42 -08:00
James R. Barlow
7fe2954ede
Change wheel tag to py36, update package_data to include py.typed 2020-12-12 16:49:04 -08:00
James R. Barlow
ad202693b3
v11.4.0 release notes - remove change not actually implemented
Remove a change that was pushed back to a future release.
2020-12-12 16:27:38 -08:00
James R. Barlow
594ef83551 v11.4.0 release notes v11.4.0 2020-12-11 15:09:49 -08:00
James R. Barlow
78b71618c1 Fix BufferedReader TypeError 2020-12-11 14:19:20 -08:00