James R. Barlow
6fbeb6347d
Merge api (without plugins)
2019-07-27 02:04:01 -07:00
James R. Barlow
f83de20c37
Remove plugins (for now)
...
It's holding up too many other useful,
releaseable changes.
2019-07-27 01:41:14 -07:00
James R. Barlow
12769b96e5
Drop support for omitting pdfminer.six
2019-07-10 13:37:01 -07:00
James R. Barlow
cbeddab35f
rename ocrmypdf.run -> ocrmypdf.ocr
2019-07-07 02:11:44 -07:00
James R. Barlow
eeae6f8292
test: Add syntax checks for shell completions
2019-07-02 13:49:17 -07:00
James R. Barlow
9b60d3e285
Improve testing of _validation.py
2019-06-22 02:33:04 -07:00
James R. Barlow
c357d4146e
Restructure ocrmypdf.pdfinfo
2019-06-20 03:10:41 -07:00
James R. Barlow
51ed381bfc
Rename weave -> graft
2019-06-13 01:16:56 -07:00
James R. Barlow
16990890d8
Remove "from ocrmypdf import ocrmypdf"
...
Messes up future imports from ocrmypdf, so don't do it.
2019-06-12 17:52:25 -07:00
James R. Barlow
8b8de7cc1d
Add new --pages feature to limit OCR to only specific pages
2019-06-12 17:27:47 -07:00
James R. Barlow
20ad032977
Fix some error messages that printed directly to sys.stderr instead of logging
2019-06-05 03:07:48 -07:00
James R. Barlow
eb5200d26a
Change most tests to use ocrmypdf API instead of subprocess
...
The main benefit of this is code coverage gains can actually follow it.
Also removes most ugly os.environ hacks.
2019-06-03 01:45:27 -07:00
James R. Barlow
e73740ae9d
test: remove test code that support tess3 or tess4 testing
2019-06-03 01:33:24 -07:00
James R. Barlow
fb933edc0f
Use newer pytest tmp_path API
2019-06-01 01:55:51 -07:00
James R. Barlow
ba41ccae1b
conftest: don't modify PYTEST_CURRENT_TEST when manipulating os.environ
...
It confuses pytest.
2019-06-01 01:41:39 -07:00
James R. Barlow
8ed4e229f3
ghostscript: avoid log=None construct
2019-05-30 13:57:38 -07:00
James R. Barlow
9d5f23e961
Rename filters to plugins
2019-05-28 02:39:25 -07:00
James R. Barlow
7566d4b768
Introduce plugins/filters
2019-05-27 16:55:04 -07:00
James R. Barlow
5c4c32ab3c
Remove multiprocessing tests - no longer valid
2019-05-27 12:07:20 -07:00
James R. Barlow
c14f62752b
Tests: add an API test
2019-05-25 16:24:09 -07:00
James R. Barlow
5cecb3ecb4
Convert one test to use API
2019-05-22 23:53:48 -07:00
James R. Barlow
32a076c039
Refactor validation and exceptions
...
CLI now tracks check_options exceptions. API now works more like
an API, without an exception handler,
because the caller should provide one.
2019-05-20 18:01:17 -07:00
James R. Barlow
ef1ef1cdf0
Fix test invalidated by Python 3.6 logging fixes
2019-05-17 15:20:07 -07:00
James R. Barlow
4340ad9f12
Update test cache
2019-05-17 01:45:06 -07:00
James R. Barlow
8df1ea2754
Mark some slow tests
2019-05-17 01:42:27 -07:00
James R. Barlow
e528adc603
pylint removal
2019-05-17 01:09:06 -07:00
James R. Barlow
13ab23ba54
Refactor weave_layers, introduce progress bar
...
Fixes a bug in this branch where --sidecar would fail by trying to iterator
the executor futures twice.
2019-05-16 14:57:31 -07:00
James R. Barlow
5e025c3382
Reinstate log level in messages to be closer to old behavior
2019-05-15 15:46:36 -07:00
James R. Barlow
486f73d5d6
Remove custom logger
2019-05-15 02:28:13 -07:00
James R. Barlow
c904b430b6
Merge master into api branch; all test pass
2019-05-14 16:33:02 -07:00
James R. Barlow
0a72c12ff0
weave: add new test for link consistency
2019-05-12 03:36:33 -07:00
James R. Barlow
482cb788ed
Don't use MagicMock() as a dummy logger in pytest
2019-05-11 12:44:17 -07:00
James R. Barlow
15a988b999
weave: use emplacement method, scrap TOC repair
...
The new emplacement method updates page objects in place without
generating new objgen numbers, meaning we no longer need to update the table
of contents to preserve links.
2019-05-11 12:40:25 -07:00
James R. Barlow
bcdd196699
ghostscript: remove unnecessary post-render resizing step
2019-05-11 12:10:50 -07:00
James R. Barlow
58c29ffb5c
weave: use explicit pdf.close(), drastically reduce open file handles
...
With the new pikepdf 1.2.0 we no longer need to hold file handles
open because of the "copy to memory" functionality. We retain
the behavior of closing/reopening the output PDF every 100 pages as
a way to limit memory usage.
2019-04-18 15:12:48 -07:00
mawi
c92ccc6134
fix: tests
2019-04-08 14:57:42 +02:00
mawi
783a128bd1
feat: move to sync (none ETL) implementation - remove ruffus
2019-04-04 21:02:38 +02:00
Martin Wind
a4667b5656
refactor: move ruffus related code to one file
2019-03-28 20:16:10 +01:00
Martin Wind
f65a3d3762
fix import in unpaper test
2019-03-26 10:04:26 +01:00
James R. Barlow
427afc0616
Fix LeptonicaErrorTrap when a sys.stderr.fileno() is not available
...
The LeptonicaErrorTrap was problematic for Celery and other
libraries that mess with stderr.
Closes #359
2019-03-17 14:22:36 -07:00
James R. Barlow
486dc7e22c
Fix some test failures missed in prev commit
2019-03-06 13:28:50 -08:00
James R. Barlow
dc616bb507
Fix test suite so --clean is not requested when unpaper is not installed
2019-03-05 22:33:13 -08:00
James R. Barlow
5da26e4c9c
Convert most uses of subprocess.Popen to subprocess.run in test suite
2019-03-05 22:25:22 -08:00
James R. Barlow
a27ee3ee8c
optimize: use Decode to invert 1bpp PNGs for now
2019-03-03 17:50:12 -08:00
James R. Barlow
58e6663806
Update test cache for french->german change
2019-03-03 03:23:59 -08:00
James R. Barlow
3f1d9ef99c
Fix tests for move to Alpine dockerfile
2019-02-26 12:30:21 -08:00
James R. Barlow
19e35db2b7
Fix issue when weave handoff occurs with no OCR font present
...
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.
Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5
Fix exception on traversing corrupt ToC entries
2019-02-10 00:50:21 -08:00
James R. Barlow
f095e91cb4
unpaper-args: add test case and harden feature
2019-02-07 16:21:02 -08:00
James R. Barlow
f34b3015b2
Prevent Ghostscript from generating invalid XMP metadata
...
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00