579 Commits

Author SHA1 Message Date
James R. Barlow
6fbeb6347d Merge api (without plugins) 2019-07-27 02:04:01 -07:00
James R. Barlow
f83de20c37 Remove plugins (for now)
It's holding up too many other useful,
releaseable changes.
2019-07-27 01:41:14 -07:00
James R. Barlow
12769b96e5 Drop support for omitting pdfminer.six 2019-07-10 13:37:01 -07:00
James R. Barlow
cbeddab35f rename ocrmypdf.run -> ocrmypdf.ocr 2019-07-07 02:11:44 -07:00
James R. Barlow
eeae6f8292 test: Add syntax checks for shell completions 2019-07-02 13:49:17 -07:00
James R. Barlow
9b60d3e285 Improve testing of _validation.py 2019-06-22 02:33:04 -07:00
James R. Barlow
c357d4146e Restructure ocrmypdf.pdfinfo 2019-06-20 03:10:41 -07:00
James R. Barlow
51ed381bfc Rename weave -> graft 2019-06-13 01:16:56 -07:00
James R. Barlow
16990890d8 Remove "from ocrmypdf import ocrmypdf"
Messes up future imports from ocrmypdf, so don't do it.
2019-06-12 17:52:25 -07:00
James R. Barlow
8b8de7cc1d Add new --pages feature to limit OCR to only specific pages 2019-06-12 17:27:47 -07:00
James R. Barlow
20ad032977 Fix some error messages that printed directly to sys.stderr instead of logging 2019-06-05 03:07:48 -07:00
James R. Barlow
eb5200d26a Change most tests to use ocrmypdf API instead of subprocess
The main benefit of this is code coverage gains can actually follow it.
Also removes most ugly os.environ hacks.
2019-06-03 01:45:27 -07:00
James R. Barlow
e73740ae9d test: remove test code that support tess3 or tess4 testing 2019-06-03 01:33:24 -07:00
James R. Barlow
fb933edc0f Use newer pytest tmp_path API 2019-06-01 01:55:51 -07:00
James R. Barlow
ba41ccae1b conftest: don't modify PYTEST_CURRENT_TEST when manipulating os.environ
It confuses pytest.
2019-06-01 01:41:39 -07:00
James R. Barlow
8ed4e229f3 ghostscript: avoid log=None construct 2019-05-30 13:57:38 -07:00
James R. Barlow
9d5f23e961 Rename filters to plugins 2019-05-28 02:39:25 -07:00
James R. Barlow
7566d4b768 Introduce plugins/filters 2019-05-27 16:55:04 -07:00
James R. Barlow
5c4c32ab3c Remove multiprocessing tests - no longer valid 2019-05-27 12:07:20 -07:00
James R. Barlow
c14f62752b Tests: add an API test 2019-05-25 16:24:09 -07:00
James R. Barlow
5cecb3ecb4 Convert one test to use API 2019-05-22 23:53:48 -07:00
James R. Barlow
32a076c039 Refactor validation and exceptions
CLI now tracks check_options exceptions. API now works more like
an API, without an exception handler,
because the caller should provide one.
2019-05-20 18:01:17 -07:00
James R. Barlow
ef1ef1cdf0 Fix test invalidated by Python 3.6 logging fixes 2019-05-17 15:20:07 -07:00
James R. Barlow
4340ad9f12 Update test cache 2019-05-17 01:45:06 -07:00
James R. Barlow
8df1ea2754 Mark some slow tests 2019-05-17 01:42:27 -07:00
James R. Barlow
e528adc603 pylint removal 2019-05-17 01:09:06 -07:00
James R. Barlow
13ab23ba54 Refactor weave_layers, introduce progress bar
Fixes a bug in this branch where --sidecar would fail by trying to iterator
the executor futures twice.
2019-05-16 14:57:31 -07:00
James R. Barlow
5e025c3382 Reinstate log level in messages to be closer to old behavior 2019-05-15 15:46:36 -07:00
James R. Barlow
486f73d5d6 Remove custom logger 2019-05-15 02:28:13 -07:00
James R. Barlow
c904b430b6 Merge master into api branch; all test pass 2019-05-14 16:33:02 -07:00
James R. Barlow
0a72c12ff0 weave: add new test for link consistency 2019-05-12 03:36:33 -07:00
James R. Barlow
482cb788ed Don't use MagicMock() as a dummy logger in pytest 2019-05-11 12:44:17 -07:00
James R. Barlow
15a988b999 weave: use emplacement method, scrap TOC repair
The new emplacement method updates page objects in place without
generating new objgen numbers, meaning we no longer need to update the table
of contents to preserve links.
2019-05-11 12:40:25 -07:00
James R. Barlow
bcdd196699 ghostscript: remove unnecessary post-render resizing step 2019-05-11 12:10:50 -07:00
James R. Barlow
58c29ffb5c weave: use explicit pdf.close(), drastically reduce open file handles
With the new pikepdf 1.2.0 we no longer need to hold file handles
open because of the "copy to memory" functionality. We retain
the behavior of closing/reopening the output PDF every 100 pages as
a way to limit memory usage.
2019-04-18 15:12:48 -07:00
mawi
c92ccc6134 fix: tests 2019-04-08 14:57:42 +02:00
mawi
783a128bd1 feat: move to sync (none ETL) implementation - remove ruffus 2019-04-04 21:02:38 +02:00
Martin Wind
a4667b5656 refactor: move ruffus related code to one file 2019-03-28 20:16:10 +01:00
Martin Wind
f65a3d3762 fix import in unpaper test 2019-03-26 10:04:26 +01:00
James R. Barlow
427afc0616 Fix LeptonicaErrorTrap when a sys.stderr.fileno() is not available
The LeptonicaErrorTrap was problematic for Celery and other
libraries that mess with stderr.

Closes #359
2019-03-17 14:22:36 -07:00
James R. Barlow
486dc7e22c Fix some test failures missed in prev commit 2019-03-06 13:28:50 -08:00
James R. Barlow
dc616bb507 Fix test suite so --clean is not requested when unpaper is not installed 2019-03-05 22:33:13 -08:00
James R. Barlow
5da26e4c9c Convert most uses of subprocess.Popen to subprocess.run in test suite 2019-03-05 22:25:22 -08:00
James R. Barlow
a27ee3ee8c optimize: use Decode to invert 1bpp PNGs for now 2019-03-03 17:50:12 -08:00
James R. Barlow
58e6663806 Update test cache for french->german change 2019-03-03 03:23:59 -08:00
James R. Barlow
3f1d9ef99c Fix tests for move to Alpine dockerfile 2019-02-26 12:30:21 -08:00
James R. Barlow
19e35db2b7 Fix issue when weave handoff occurs with no OCR font present
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.

Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5 Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00
James R. Barlow
f095e91cb4 unpaper-args: add test case and harden feature 2019-02-07 16:21:02 -08:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00