James R. Barlow
a005d14f91
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2020-01-30 16:24:16 -08:00
Matthias Braun
6f66232d44
Fix typos, add instructions for training data ( #477 )
2020-01-30 16:24:41 -08:00
James R. Barlow
b8a780d684
Wait for file based on pikepdf
2020-01-30 12:40:48 -08:00
James R. Barlow
82f393dd09
Order of events
2020-01-30 12:40:19 -08:00
James R. Barlow
4952af1604
watcher: some refactoring
2020-01-28 12:56:19 -08:00
James R. Barlow
bcf77375c0
Fix grammar in output message
2020-01-28 07:33:28 -08:00
Ian Alexander
3eab161771
Update logging and env var extensibility
2020-01-20 10:45:28 -08:00
Ian Alexander
b7f38e976b
Watched folder bug fixes, new flags, and docs updates.
2020-01-20 00:20:29 -08:00
James R. Barlow
a6567f2ae4
v9.5.0 release notes revised
v9.5.0
2020-01-18 01:48:33 -08:00
James R. Barlow
e860c56b75
Fix regression: metadata updates not taking effect
2020-01-17 23:01:37 -08:00
James R. Barlow
2e15d52895
v9.5.0 release notes
2020-01-17 03:11:33 -08:00
James R. Barlow
ce97af5a79
Add OCR quality measurement API
2020-01-17 03:10:27 -08:00
James R. Barlow
3831c4cd4d
Refactor metadata_fixup
2020-01-14 01:10:15 -08:00
James R. Barlow
61a2674317
Skip test that needs chmod when on Windows
v9.4.0
2020-01-06 02:36:04 -08:00
James R. Barlow
9ad8cbf1f6
Fix assert that depends on POSIX-y file handling
2020-01-06 02:02:05 -08:00
James R. Barlow
123fde174d
Don't use debug.log in pytest
...
pytest does not reset the state of logging if we install a file handler,
which will cause FileNotFoundError after the temporary folder is removed.
Semi-related:
https://github.com/pytest-dev/pytest/issues/5502
2020-01-06 01:46:19 -08:00
James R. Barlow
fd991a2380
Allow pdfminer.six 20200104 and update recommended versions
2020-01-05 21:37:28 -08:00
James R. Barlow
6f5d77d930
Also generate log file in temp folder on verbose mode
2020-01-05 21:33:32 -08:00
James R. Barlow
5169ac633b
docs: mention pdfgrep too
2020-01-05 21:32:36 -08:00
James R. Barlow
5b6ab1e003
lept: improve lib not found error message
...
Closes #471
2020-01-05 01:05:53 -08:00
James R. Barlow
8f984bf958
docs: add note on limitations of sidecar file
2020-01-04 16:43:13 -08:00
James R. Barlow
9c5f0d0ec6
Eliminate last use of PyPDF2 from test suite
2020-01-04 16:32:01 -08:00
James R. Barlow
32041c43e1
tests: improve tesseract coverage
2020-01-04 02:35:14 -08:00
James R. Barlow
599028bebb
tesseract: don't explicitly set lstm_use_matrix
...
Apparently tesseract does this own its own as needed.
2020-01-04 01:17:33 -08:00
James R. Barlow
6faa8f7221
logging: always log process arguments and stderr when at debug
...
Also remove ad-hoc logging of this information.
2020-01-01 16:48:48 -08:00
James R. Barlow
a4dc5e365f
logging: fix incorrect usage: logging.Logger()
2020-01-01 16:47:36 -08:00
James R. Barlow
e2a563cc76
logging: create a debug log when -k parameter is issued
2020-01-01 16:47:15 -08:00
James R. Barlow
1037d73efb
tests: use smaller files for ghostscript
2019-12-31 17:20:28 -08:00
James R. Barlow
aeb7b142a9
tests: skip tests not compatible with coverage
...
For reasons not entirely clear, stdout will get some data injected when
pytest-cov is running. Our tests that
check for clean stdout need to ignore this.
We check for an environment variable that is defined only when coverage is
running.
2019-12-31 17:10:51 -08:00
James R. Barlow
422ea9777e
Remove session scope from fixtures
...
pytest seems to prepare os.environ in complex ways, so we want to ensure
these fixtures are not reused.
2019-12-31 17:09:23 -08:00
James R. Barlow
2f1c743227
Rewrite main pool loop
...
pytest-cov documentation recommends using explicit
management of multiprocessing.Pool rather than the context manager.
This is supposed to work better for collecting coverage data, particularly
on Windows.
2019-12-31 16:23:41 -08:00
James R. Barlow
96ee21aee9
Try to set up subprocess coverage better
2019-12-31 15:39:45 -08:00
James R. Barlow
4b759af6ff
tests: fix problems with ghostscript spoofers
2019-12-31 15:33:03 -08:00
James R. Barlow
25d2b0cda4
test: environment warnings/cleanup
2019-12-30 22:38:50 -08:00
James R. Barlow
16dd8b54a8
ghostscript: don't delete output_file that will never exist
...
We stream output now, so no point in deleting.
2019-12-30 22:38:38 -08:00
James R. Barlow
c4dc5269d2
tests: remove some obscure things from coverage
2019-12-30 21:16:16 -08:00
James R. Barlow
c36e9950ae
tests: test TqdmConsole
2019-12-30 17:51:09 -08:00
James R. Barlow
0c0d53b10f
tests: AcroForm test case did not work correctly; fixed
2019-12-30 17:50:32 -08:00
James R. Barlow
63de7e1677
Improve error message for unreadable input files
2019-12-30 16:14:52 -08:00
James R. Barlow
b0e92760a2
tests: add coverage for helpers
2019-12-30 15:52:10 -08:00
James R. Barlow
054c0773a3
Update completions
v9.3.0
2019-12-29 02:40:55 -08:00
James R. Barlow
89aa78b724
docs: fix obsolete statement to "brew install tesseract-lang"
...
Closes #469
2019-12-29 02:37:10 -08:00
James R. Barlow
708113a514
Windows: Remove Program Files cache from ocrmypdf.exec
...
@lru_cache doesn't work here, so let's just remove it.
2019-12-29 02:36:20 -08:00
James R. Barlow
95ef5410c2
azure: tweak windows script
2019-12-28 16:12:53 -08:00
James R. Barlow
868b3b4abd
exec/init: os.get_exec_path() returns list not str
2019-12-28 16:11:08 -08:00
James R. Barlow
045bdff95a
azure: homebrew broke something to do with python@2?
2019-12-28 16:10:43 -08:00
James R. Barlow
d12b27ac1d
v9.3.0 release notes
2019-12-28 15:42:24 -08:00
James R. Barlow
e4e00de79f
Add improved example demonstrating watched folder functionality
...
Closes #466
2019-12-28 15:37:42 -08:00
James R. Barlow
a53a3937c2
Fix exception on parsing Ghostscript error messages
2019-12-20 11:25:45 -08:00
James R. Barlow
343424b4d2
azure: only publish code coverage for macOS
...
macOS (due to Homebrew) currently has the most comprehensive code
coverage. Azure's code coverage feature does not merge code coverage,
so last task to finish wins.
2019-12-20 10:56:10 -08:00