71 Commits

Author SHA1 Message Date
James R. Barlow
380b981763 Remove most Python 3.6 special casing 2021-11-13 00:27:48 -08:00
James R. Barlow
790d3022f6 Implement --output-type=none to skip producing the PDF and use only the sidecar
Closes #787
2021-09-26 01:07:34 -07:00
James R. Barlow
906d77b389
tests: remove obsolete running_in_travis() 2021-04-07 02:25:10 -07:00
James R. Barlow
9416e850ff
Remove another instance of helpers_namespace 2021-04-07 02:23:04 -07:00
James R. Barlow
aa115a8be3
Remove pytest_helpers_namespace 2021-04-07 01:56:51 -07:00
James R. Barlow
2846d46bb8
Remove .coveragerc and fold into setup.cfg 2021-01-06 03:58:18 -08:00
James R. Barlow
895fddd85e
Replace most uses of universal_newlines with text
The parameters are equivalent but the latter is better named. Since
Python 3.6 doesn't support text= we use our wrapper to add it in that
place.

This is for subprocess.run.
2020-11-07 00:48:08 -08:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
48e2750551
Fix some tests that were failing in Docker 2020-06-21 01:48:13 -07:00
James R. Barlow
64891c2fc3
Pre-release delinting 2020-06-09 15:27:14 -07:00
James R. Barlow
0f942fb714 Rename ocrmypdf.exec -> ocrmypdf._exec 2020-06-09 14:59:09 -07:00
James R. Barlow
3b6f6782f0
Remove tesseract_env, --tesseract-env 2020-06-09 00:39:53 -07:00
James R. Barlow
21c0e045cb
Remove _OCRMYPDF_TEST_PATH environment variable 2020-06-09 00:30:13 -07:00
James R. Barlow
ebbf68bd08
The big payoff: abolishing spoofing machinery 2020-06-09 00:08:20 -07:00
James R. Barlow
a9a473f2e5 Convert all tesseract cache usages to plugin 2020-06-05 17:55:18 -07:00
James R. Barlow
1598f2f0e5 Abolish spoof_tesseract_noop 2020-06-01 03:07:53 -07:00
James R. Barlow
2b23f7ec73
tesseract_noop: begin implementing with plugin 2020-06-01 02:45:49 -07:00
James R. Barlow
9bccff4f88
Move Tesseract specific arguments to plugin 2020-05-16 03:24:31 -07:00
James R. Barlow
2bd586e093
Compare requested languages to OCR engine instead of tesseract directly
Also refactoring to facilitating validation needing the plugin manager.
2020-05-16 01:50:37 -07:00
James R. Barlow
41eb54cc0a
Standardize tesseract.generate_hocr and _pdf parameters 2020-05-14 03:23:25 -07:00
James R. Barlow
12a2f78c4d
Fix validation of languages not using tesseract_env
And some related issues.
2020-05-14 03:19:22 -07:00
James R. Barlow
85cbf94a6e
Convert many uses of str paths to Path 2020-05-06 02:53:47 -07:00
James R. Barlow
c85278b31d
Delinting 2020-05-03 00:53:29 -07:00
James R. Barlow
e02f6c1e97
Support plugin invocation with API 2020-05-02 03:34:31 -07:00
James R. Barlow
378e4dae3b
Expand documentation for subprocess.run() from test 2020-03-04 13:37:44 -08:00
James R. Barlow
422ea9777e Remove session scope from fixtures
pytest seems to prepare os.environ in complex ways, so we want to ensure
these fixtures are not reused.
2019-12-31 17:09:23 -08:00
James R. Barlow
2f1c743227 Rewrite main pool loop
pytest-cov documentation recommends using explicit
management of multiprocessing.Pool rather than the context manager.
This is supposed to work better for collecting coverage data, particularly
on Windows.
2019-12-31 16:23:41 -08:00
James R. Barlow
96ee21aee9 Try to set up subprocess coverage better 2019-12-31 15:39:45 -08:00
James R. Barlow
25d2b0cda4 test: environment warnings/cleanup 2019-12-30 22:38:50 -08:00
James R. Barlow
c5edff2c2f Sort imports 2019-12-19 15:31:18 -08:00
James R. Barlow
f6510e2b15 Document function of symlink shim 2019-12-06 15:00:12 -08:00
James R. Barlow
06a1f987d4 Use _OCRMYPDF_TEST_PATH for testing and .py stubs to simulate symlinks 2019-12-04 21:01:06 -08:00
James R. Barlow
43ab7c88d7 Remove os_environ() context manager 2019-12-04 17:37:38 -08:00
James R. Barlow
0cd424ffcb Enforce str-only environment for Windows since it's more strict 2019-12-04 17:14:27 -08:00
James R. Barlow
fde550f9a7 test: Replace many instances of run_ocrmypdf in subprocess with inline 2019-12-04 17:14:27 -08:00
James R. Barlow
3f92867ae6 Fix TypeError "environment can only contain strings"
Apparently Windows Python doesn't coerce pathlib.Path to str.
2019-12-04 17:13:51 -08:00
James R. Barlow
7755c5c5a7 tests: fix interpretation of None as omitted argument 2019-08-11 16:58:22 -07:00
James R. Barlow
6fbeb6347d Merge api (without plugins) 2019-07-27 02:04:01 -07:00
James R. Barlow
12769b96e5 Drop support for omitting pdfminer.six 2019-07-10 13:37:01 -07:00
James R. Barlow
20ad032977 Fix some error messages that printed directly to sys.stderr instead of logging 2019-06-05 03:07:48 -07:00
James R. Barlow
eb5200d26a Change most tests to use ocrmypdf API instead of subprocess
The main benefit of this is code coverage gains can actually follow it.
Also removes most ugly os.environ hacks.
2019-06-03 01:45:27 -07:00
James R. Barlow
fb933edc0f Use newer pytest tmp_path API 2019-06-01 01:55:51 -07:00
James R. Barlow
ba41ccae1b conftest: don't modify PYTEST_CURRENT_TEST when manipulating os.environ
It confuses pytest.
2019-06-01 01:41:39 -07:00
James R. Barlow
5cecb3ecb4 Convert one test to use API 2019-05-22 23:53:48 -07:00
James R. Barlow
dc616bb507 Fix test suite so --clean is not requested when unpaper is not installed 2019-03-05 22:33:13 -08:00
James R. Barlow
5da26e4c9c Convert most uses of subprocess.Popen to subprocess.run in test suite 2019-03-05 22:25:22 -08:00
James R. Barlow
f095e91cb4 unpaper-args: add test case and harden feature 2019-02-07 16:21:02 -08:00
James R. Barlow
8c0009c5c8 Make pdfminer.six optional
Mainly since the current release of pdfminer.six lacks a sdist, blocking
homebrew packaging. Also in case other distros don't accept pdfminer.six.
2018-12-31 01:08:43 -08:00
James R. Barlow
0880b16491 Sort imports with isort 2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce Reformat with black 2018-12-30 01:27:49 -08:00