2895 Commits

Author SHA1 Message Date
James R. Barlow
3bd5054634 lambda: move to extra_plugins folder 2021-01-24 23:47:26 -08:00
James R. Barlow
6a8dd65aa2 lambda: more issues related to new executor semantics
Now all tests pass, except for:
-tests that check the progress bar
-tests where xdist may or may not load a _lambda_plugin by running
some other test first before a test in optimize
2021-01-24 23:46:40 -08:00
James R. Barlow
6083b4f0a7 lambda: don't overrun number of workers needed 2021-01-24 23:46:40 -08:00
James R. Barlow
1a3ce59476 lambda: Don't be paranoid about exception marshalling
It works
2021-01-24 23:46:40 -08:00
James R. Barlow
c395436ba3 lambda: tidying, special casing use_threads 2021-01-24 23:46:40 -08:00
James R. Barlow
8d23d0b441 Operational lambda executor 2021-01-24 23:46:40 -08:00
James R. Barlow
c6a2716cdb Temporary move into package 2021-01-24 23:46:40 -08:00
James R. Barlow
5545bae76f lambda_plugin.py: doesn't work since entry point needs to be in package 2021-01-24 23:46:33 -08:00
James R. Barlow
7bccb8c748 tests: fix concurrency 2021-01-24 23:46:33 -08:00
James R. Barlow
173c0d1274 concurrency: lock progress pool
For API sanity and to communicate expectations. One progress pool at
a time is plenty of complexity.
2021-01-24 23:46:33 -08:00
James R. Barlow
6953f32465 pdfinfo: remove some messy concurrency handling
We can cut down on the use of global variables and save opening
an extra copy of the Pdf when threaded.
2021-01-24 23:46:33 -08:00
James R. Barlow
26b4d9bb4b Refactor concurrency so that it is pluggable
However, this may not be the best idea because it involves global
state that could be overridden by a parallel call to ocrmypdf.ocr.
2021-01-24 23:46:29 -08:00
James R. Barlow
34e564cd7d Use queue.Queue instead of multiprocessing.Queue in threaded mode 2021-01-24 23:45:26 -08:00
James R. Barlow
504d5776d2 Refactor plugin manager to eliminate callback 2021-01-24 23:42:40 -08:00
James R. Barlow
ee23976858 Re-sequence plugin installation 2021-01-24 23:42:28 -08:00
James R. Barlow
f559316881 Insert setuptools plugins with ocrmypdf prefix 2021-01-24 23:42:09 -08:00
James R. Barlow
084610c242
Automate insertion of builtin modules 2021-01-24 23:41:50 -08:00
James R. Barlow
9ff627472b Update pre-commit 2021-01-24 23:40:59 -08:00
James R. Barlow
956310d1ec
Import PageContext, PdfContext since they are referenced in pluginspec 2021-01-24 02:04:47 -08:00
James R. Barlow
1a982da442 tests: confirm that we produce pdf when optimization is off 2021-01-24 01:54:25 -08:00
James R. Barlow
4879a1f0de
docs: no MS Store Python 2021-01-18 13:27:31 -08:00
James R. Barlow
ce66bcc9c8 github: Ask how ocrmypdf was installed 2021-01-10 02:35:24 -08:00
James R. Barlow
1ebf3144af v11.5.0 release notes v11.5.0 2021-01-09 16:48:40 -08:00
James R. Barlow
7a1cccbc4e Fallback to LeptonicaErrorTrap_Redirect if ffi.callback fails
Might fix issue #709, Apple silicon support.
2021-01-09 16:47:11 -08:00
James R. Barlow
ebacff1b39
tests: Fix debug logging test 2021-01-09 16:41:57 -08:00
James R. Barlow
c7c447be66
Add test for configure_debug_logging
Since we can't directly test it
2021-01-09 16:02:12 -08:00
James R. Barlow
91aa175602
Consider text when determining page raster DPI
Previously if we found vectors of any sort on a page, we would bump
the DPI up to 400. We did nothing
about pages with text. As a result,
pages with a low image resolution
and printable text would have the text downgraded to image
resolution when --force-ocr was used.

We don't try to determine if the text is visible or invisible OCR text, since
that is a slower test. --redo-ocr would improve such cases anyway.
2021-01-09 16:01:49 -08:00
James R. Barlow
b267494e4a
Create raster PDF pages to match input page size
Previously we produced a raster image, then multiplied image width
by DPI to get the page size. However if there is rounding the
page size may not match exactly. In this modified approach we
constrain the page size to match.
2021-01-08 15:10:43 -08:00
James R. Barlow
f687180ecc
tests: tidy pdfinfo 2021-01-08 15:04:52 -08:00
James R. Barlow
6f4b38b103
ghostscript: tidy comments 2021-01-08 00:41:03 -08:00
James R. Barlow
d32324859c
v11.4.5 release notes v11.4.5 2021-01-06 11:42:28 -08:00
James R. Barlow
48222b87b5 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2021-01-06 03:59:40 -08:00
Jonas Winkler
62e5edc72b
fix unclosed file warnings. (#710)
Co-authored-by: Jonas Winkler <jonas.winkler@jpwinkler.de>
2021-01-06 03:59:28 -08:00
James R. Barlow
2846d46bb8
Remove .coveragerc and fold into setup.cfg 2021-01-06 03:58:18 -08:00
James R. Barlow
47ef1914d4
v11.4.4 release notes v11.4.4 2021-01-01 01:39:24 -08:00
James R. Barlow
df157552f3
Make ocrmypdf.ocr take a threading lock 2021-01-01 01:37:09 -08:00
James R. Barlow
0b3a526049
Partial fix crash on 'userunit' None (#700)
Our method of getting data from pdfminer would silently consume a StopIteration
if pdfminer returned no processed pages, leading to odd error message.

We improve an error from pdfminer properly, and returning a more
descriptive error of our own.

It would be possible for ocrmypdf to repair the file before sending it to
pdfminer, but this seems to be rare enough that we won't do that yet.
2021-01-01 01:11:32 -08:00
James R. Barlow
1e80d412fa
tesseract: fix typing of some optional arguments 2021-01-01 00:46:00 -08:00
James R. Barlow
df6e106203
concurrent: simplify results loop 2021-01-01 00:44:46 -08:00
James R. Barlow
bd0f005861
tests: tag tests that need pngquant, jbig2enc v11.4.3 2020-12-30 01:58:57 -08:00
James R. Barlow
6ba4b7b3f3
ci: temporarily disable pngquant on Windows
Looks like a packaging error, choco complains of bad hashes.
2020-12-30 01:40:56 -08:00
James R. Barlow
2c11349ee8 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-12-29 21:40:46 -08:00
James R. Barlow
b0afef09ef
v11.4.3 release notes 2020-12-29 21:40:35 -08:00
James R. Barlow
72fa347c38
tests: skip metadata test for two pikepdf versions that warn incorrectly 2020-12-29 01:47:52 -08:00
James R. Barlow
96d68c2413
pipeline: refactor metadata_fixup 2020-12-29 01:47:32 -08:00
James R. Barlow
babc76fa74 tests: assert that most patched functions are called
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
Tim Gates
dc06990e5d
docs: fix simple typo, instsalled -> installed (#704)
There is a small typo in docs/installation.rst.

Should read `installed` rather than `instsalled`.
2020-12-28 15:28:34 -08:00
James R. Barlow
0ff0d2f8d1
Remove PDF/A overprint debug message
Since we currently log all of a process's output at debug it's
redundant to log this separate message.
2020-12-27 16:19:05 -08:00
James R. Barlow
81602cf420
Fix test not patching properly after Ghostscript polling change 2020-12-27 16:01:50 -08:00
James R. Barlow
607e2d7e81
v11.4.2 release notes v11.4.2 2020-12-27 03:29:35 -08:00