2676 Commits

Author SHA1 Message Date
James R. Barlow
a005d14f91 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-01-30 16:24:16 -08:00
Matthias Braun
6f66232d44
Fix typos, add instructions for training data (#477) 2020-01-30 16:24:41 -08:00
James R. Barlow
b8a780d684 Wait for file based on pikepdf 2020-01-30 12:40:48 -08:00
James R. Barlow
82f393dd09 Order of events 2020-01-30 12:40:19 -08:00
James R. Barlow
4952af1604 watcher: some refactoring 2020-01-28 12:56:19 -08:00
James R. Barlow
bcf77375c0 Fix grammar in output message 2020-01-28 07:33:28 -08:00
Ian Alexander
3eab161771
Update logging and env var extensibility 2020-01-20 10:45:28 -08:00
Ian Alexander
b7f38e976b Watched folder bug fixes, new flags, and docs updates. 2020-01-20 00:20:29 -08:00
James R. Barlow
a6567f2ae4 v9.5.0 release notes revised v9.5.0 2020-01-18 01:48:33 -08:00
James R. Barlow
e860c56b75 Fix regression: metadata updates not taking effect 2020-01-17 23:01:37 -08:00
James R. Barlow
2e15d52895 v9.5.0 release notes 2020-01-17 03:11:33 -08:00
James R. Barlow
ce97af5a79 Add OCR quality measurement API 2020-01-17 03:10:27 -08:00
James R. Barlow
3831c4cd4d Refactor metadata_fixup 2020-01-14 01:10:15 -08:00
James R. Barlow
61a2674317 Skip test that needs chmod when on Windows v9.4.0 2020-01-06 02:36:04 -08:00
James R. Barlow
9ad8cbf1f6 Fix assert that depends on POSIX-y file handling 2020-01-06 02:02:05 -08:00
James R. Barlow
123fde174d Don't use debug.log in pytest
pytest does not reset the state of logging if we install a file handler,
which will cause FileNotFoundError after the temporary folder is removed.

Semi-related:
https://github.com/pytest-dev/pytest/issues/5502
2020-01-06 01:46:19 -08:00
James R. Barlow
fd991a2380 Allow pdfminer.six 20200104 and update recommended versions 2020-01-05 21:37:28 -08:00
James R. Barlow
6f5d77d930 Also generate log file in temp folder on verbose mode 2020-01-05 21:33:32 -08:00
James R. Barlow
5169ac633b docs: mention pdfgrep too 2020-01-05 21:32:36 -08:00
James R. Barlow
5b6ab1e003 lept: improve lib not found error message
Closes #471
2020-01-05 01:05:53 -08:00
James R. Barlow
8f984bf958 docs: add note on limitations of sidecar file 2020-01-04 16:43:13 -08:00
James R. Barlow
9c5f0d0ec6 Eliminate last use of PyPDF2 from test suite 2020-01-04 16:32:01 -08:00
James R. Barlow
32041c43e1 tests: improve tesseract coverage 2020-01-04 02:35:14 -08:00
James R. Barlow
599028bebb tesseract: don't explicitly set lstm_use_matrix
Apparently tesseract does this own its own as needed.
2020-01-04 01:17:33 -08:00
James R. Barlow
6faa8f7221 logging: always log process arguments and stderr when at debug
Also remove ad-hoc logging of this information.
2020-01-01 16:48:48 -08:00
James R. Barlow
a4dc5e365f logging: fix incorrect usage: logging.Logger() 2020-01-01 16:47:36 -08:00
James R. Barlow
e2a563cc76 logging: create a debug log when -k parameter is issued 2020-01-01 16:47:15 -08:00
James R. Barlow
1037d73efb tests: use smaller files for ghostscript 2019-12-31 17:20:28 -08:00
James R. Barlow
aeb7b142a9 tests: skip tests not compatible with coverage
For reasons not entirely clear, stdout will get some data injected when
pytest-cov is running. Our tests that
check for clean stdout need to ignore this.

We check for an environment variable that is defined only when coverage is
running.
2019-12-31 17:10:51 -08:00
James R. Barlow
422ea9777e Remove session scope from fixtures
pytest seems to prepare os.environ in complex ways, so we want to ensure
these fixtures are not reused.
2019-12-31 17:09:23 -08:00
James R. Barlow
2f1c743227 Rewrite main pool loop
pytest-cov documentation recommends using explicit
management of multiprocessing.Pool rather than the context manager.
This is supposed to work better for collecting coverage data, particularly
on Windows.
2019-12-31 16:23:41 -08:00
James R. Barlow
96ee21aee9 Try to set up subprocess coverage better 2019-12-31 15:39:45 -08:00
James R. Barlow
4b759af6ff tests: fix problems with ghostscript spoofers 2019-12-31 15:33:03 -08:00
James R. Barlow
25d2b0cda4 test: environment warnings/cleanup 2019-12-30 22:38:50 -08:00
James R. Barlow
16dd8b54a8 ghostscript: don't delete output_file that will never exist
We stream output now, so no point in deleting.
2019-12-30 22:38:38 -08:00
James R. Barlow
c4dc5269d2 tests: remove some obscure things from coverage 2019-12-30 21:16:16 -08:00
James R. Barlow
c36e9950ae tests: test TqdmConsole 2019-12-30 17:51:09 -08:00
James R. Barlow
0c0d53b10f tests: AcroForm test case did not work correctly; fixed 2019-12-30 17:50:32 -08:00
James R. Barlow
63de7e1677 Improve error message for unreadable input files 2019-12-30 16:14:52 -08:00
James R. Barlow
b0e92760a2 tests: add coverage for helpers 2019-12-30 15:52:10 -08:00
James R. Barlow
054c0773a3 Update completions v9.3.0 2019-12-29 02:40:55 -08:00
James R. Barlow
89aa78b724 docs: fix obsolete statement to "brew install tesseract-lang"
Closes #469
2019-12-29 02:37:10 -08:00
James R. Barlow
708113a514 Windows: Remove Program Files cache from ocrmypdf.exec
@lru_cache doesn't work here, so let's just remove it.
2019-12-29 02:36:20 -08:00
James R. Barlow
95ef5410c2 azure: tweak windows script 2019-12-28 16:12:53 -08:00
James R. Barlow
868b3b4abd exec/init: os.get_exec_path() returns list not str 2019-12-28 16:11:08 -08:00
James R. Barlow
045bdff95a azure: homebrew broke something to do with python@2? 2019-12-28 16:10:43 -08:00
James R. Barlow
d12b27ac1d v9.3.0 release notes 2019-12-28 15:42:24 -08:00
James R. Barlow
e4e00de79f Add improved example demonstrating watched folder functionality
Closes #466
2019-12-28 15:37:42 -08:00
James R. Barlow
a53a3937c2 Fix exception on parsing Ghostscript error messages 2019-12-20 11:25:45 -08:00
James R. Barlow
343424b4d2 azure: only publish code coverage for macOS
macOS (due to Homebrew) currently has the most comprehensive code
coverage. Azure's code coverage feature does not merge code coverage,
so last task to finish wins.
2019-12-20 10:56:10 -08:00