2676 Commits

Author SHA1 Message Date
James R. Barlow
8b8de7cc1d Add new --pages feature to limit OCR to only specific pages 2019-06-12 17:27:47 -07:00
James R. Barlow
aba293fd80 Change "Temporary working files" output message 2019-06-12 13:56:02 -07:00
James R. Barlow
066a293462 If verbose, print stacktrace on KeyboardInterrupt 2019-06-12 13:55:43 -07:00
James R. Barlow
0bbd6885e2 Make the go/no-go decision pluggable v8.4.0b1 2019-06-06 23:07:46 -07:00
James R. Barlow
5dd10c961c Docker: prefer streaming 2019-06-05 03:14:36 -07:00
James R. Barlow
81fc95556c Add progress bar for PdfInfo step 2019-06-05 03:08:04 -07:00
James R. Barlow
20ad032977 Fix some error messages that printed directly to sys.stderr instead of logging 2019-06-05 03:07:48 -07:00
James R. Barlow
93f1b73579 Fix --remove-vectors which was broken in API migration
It got dropped during the change. This feature has also been altered so that
the final visual appearance of the file is not affected, only the OCR image.
2019-06-05 02:04:45 -07:00
James R. Barlow
fd427a8ec1 plugins: replace path manipulation 2019-06-05 01:46:56 -07:00
James R. Barlow
9444cf357b optimize: add divide by zero check 2019-06-04 02:01:53 -07:00
James R. Barlow
5ab69153ee Fix .coveragerc 2019-06-03 02:26:49 -07:00
James R. Barlow
eb5200d26a Change most tests to use ocrmypdf API instead of subprocess
The main benefit of this is code coverage gains can actually follow it.
Also removes most ugly os.environ hacks.
2019-06-03 01:45:27 -07:00
James R. Barlow
98a3fda1f5 Drop support for Tesseract 4 alpha releases without textonly_pdf (mostly)
hocr renderer can still be used
2019-06-03 01:39:41 -07:00
James R. Barlow
e73740ae9d test: remove test code that support tess3 or tess4 testing 2019-06-03 01:33:24 -07:00
James R. Barlow
fb933edc0f Use newer pytest tmp_path API 2019-06-01 01:55:51 -07:00
James R. Barlow
ba41ccae1b conftest: don't modify PYTEST_CURRENT_TEST when manipulating os.environ
It confuses pytest.
2019-06-01 01:41:39 -07:00
James R. Barlow
df9e286e9c Make bypassed exception clearer 2019-06-01 01:35:15 -07:00
James R. Barlow
b9d6e46572 shutil.rmtree: use builtin error suppression 2019-05-31 15:12:46 -07:00
James R. Barlow
8347c0d662 validation: remove dead code check_input_file 2019-05-31 01:57:08 -07:00
James R. Barlow
45a361d112 Add option to use threads instead of processes
Mainly since they are more convenient for debugging
2019-05-31 01:56:16 -07:00
James R. Barlow
522e1e948b ghostscript: don't use threads= for generate_pdfa
Not supported for pdfwrite
2019-05-31 01:55:29 -07:00
James R. Barlow
8ed4e229f3 ghostscript: avoid log=None construct 2019-05-30 13:57:38 -07:00
James R. Barlow
db29cae177 Docker docs: Remove legacy images, revive Ubuntu 2019-05-28 21:36:45 -07:00
James R. Barlow
d5b6cbb95e Update Ubuntu dockerfile 2019-05-28 15:36:50 -07:00
James R. Barlow
396c39978a Reorganize .docker folder so we don't have to rebuild as much 2019-05-28 14:18:54 -07:00
James R. Barlow
9d5f23e961 Rename filters to plugins 2019-05-28 02:39:25 -07:00
James R. Barlow
26a6232e1c Ignore DSStore 2019-05-28 02:33:35 -07:00
James R. Barlow
7566d4b768 Introduce plugins/filters 2019-05-27 16:55:04 -07:00
James R. Barlow
5c4c32ab3c Remove multiprocessing tests - no longer valid 2019-05-27 12:07:20 -07:00
James R. Barlow
692f7b3151 Dockerfile: with newer pip
Newer pip seems to install ocrmypdf-*.dist-info and has no problem reporting
installed version unlike -egg-info, so
skip copying.

Also move WORKDIR
2019-05-26 04:31:53 -07:00
James R. Barlow
8d0958d7ea Dockerfile: qpdf-dev needs to be requested explicitly 2019-05-26 04:30:34 -07:00
James R. Barlow
e9731b6bac Docker: upgrade pip, temporarily enable community repository for qpdf 2019-05-26 04:00:24 -07:00
James R. Barlow
0628a89041 docs: mention how to use Docker image shell 2019-05-26 00:20:40 -07:00
James R. Barlow
c14f62752b Tests: add an API test 2019-05-25 16:24:09 -07:00
James R. Barlow
24855045e1 Provisionally add filters 2019-05-25 16:23:39 -07:00
James R. Barlow
ed236e0c27 Begin API documentation 2019-05-24 01:05:32 -07:00
James R. Barlow
db6aa22eae Progress bar: unit types 2019-05-23 02:00:47 -07:00
James R. Barlow
805aa776ad Re-disable progress bar when not connected to tty 2019-05-23 02:00:35 -07:00
James R. Barlow
d0efdf643c Cleanup working files when done with a particular file, rather than end of process 2019-05-23 01:25:08 -07:00
James R. Barlow
22298b31be Fix distinction between clean and clean_final lost in API refactor 2019-05-23 01:19:58 -07:00
James R. Barlow
5cecb3ecb4 Convert one test to use API 2019-05-22 23:53:48 -07:00
James R. Barlow
a139e64c67 api: short-circuit exception handler, as caller should provide their own 2019-05-22 18:30:30 -07:00
James R. Barlow
db69b4d11a Improve argparse behavior for its role in making the API work 2019-05-22 15:55:48 -07:00
James R. Barlow
8bcb85720c release notes: clarify 2019-05-22 15:34:23 -07:00
James R. Barlow
09ca1bee97 Add progress bar to optimize and add option to disable it 2019-05-22 15:31:48 -07:00
James R. Barlow
23dd77ce0f api: fix progress_bar_friendly=False 2019-05-22 15:31:03 -07:00
James R. Barlow
32a076c039 Refactor validation and exceptions
CLI now tracks check_options exceptions. API now works more like
an API, without an exception handler,
because the caller should provide one.
2019-05-20 18:01:17 -07:00
James R. Barlow
e4baa8c0dd Remove sys.exit() calls so we don't terminate caller application 2019-05-20 15:08:20 -07:00
James R. Barlow
2fdaa76a0d Refactor configure_logging 2019-05-20 14:54:34 -07:00
James R. Barlow
7ee0c52a57 Refactor cli into basic high level api 2019-05-19 22:34:45 -07:00