James R. Barlow
c591f9601a
Remove Latin hOCR test
2023-11-19 23:51:27 -08:00
James R. Barlow
95b14ee282
Refactor lossless reconstruction setter into separate function
...
Still messy but good enough as a start.
2023-10-24 00:52:31 -07:00
James R. Barlow
ea36aedb5f
Overhaul version checkers to prefer Version to str
2023-09-25 00:59:44 -07:00
James R. Barlow
5124daa79f
Fix test failures from preceding
2023-06-19 23:25:31 -07:00
James R. Barlow
b7eb93eb79
Adopt ruff and fix prelim lints
2023-04-14 00:19:17 -07:00
James R. Barlow
46d0978a09
Update version scripts to support Ghostscript 10.0
2022-10-03 21:59:31 -07:00
James R. Barlow
c2ccc7f29d
Fix test failure due to new logging from pikepdf
2022-09-21 01:00:08 -07:00
James R. Barlow
acc70036cc
Set minimum Tesseract to 4.1.1
2022-08-02 15:20:29 -07:00
James R. Barlow
67773da309
Drop support for Ghostscript <9.50
2022-08-02 15:01:10 -07:00
James R. Barlow
5fe3102e4e
tests: new test to confirm correct printing of tesseract install advice
2022-08-01 12:31:37 -07:00
James R. Barlow
5b57520c98
tests: simplify some validation tests
2022-08-01 12:31:05 -07:00
James R. Barlow
30e4198f3a
tests: fix test_validation when chi_sim not installed
2022-08-01 02:47:39 -07:00
James R. Barlow
ba372e5841
Reorganize validation to fix exception when Tesseract not installed
...
The existing logic would call an OCR plugin's get_languages function before
allowing the plugin to check if its dependencies were available. This caused
an exception if Tesseract was installed, when we were supposed to issue
an error message advising the user to install Tesseract.
2022-08-01 02:04:09 -07:00
James R. Barlow
80ed2117cc
Change to SPDX license tracking
2022-07-28 01:10:07 -07:00
James R. Barlow
dc6f1a266a
Modernize type annotations
2022-07-23 00:39:24 -07:00
James R. Barlow
17a5b8b43c
Refactor reporting of optimization failures
2022-06-13 01:30:15 -07:00
James R. Barlow
61069660a2
Move optimization options to plugin
2022-06-12 02:42:16 -07:00
James R. Barlow
f91faf9795
Add new argument --tesseract-thresholding to control tesseract thresholding where available
...
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
6c34d59836
tesseract: yet another version variant
2021-11-04 00:17:18 -07:00
James R. Barlow
790d3022f6
Implement --output-type=none to skip producing the PDF and use only the sidecar
...
Closes #787
2021-09-26 01:07:34 -07:00
James R. Barlow
5f01c5e330
Fix another species of Tesseract version number breaking regex
...
Fixes #795
2021-06-16 00:09:03 -07:00
James R. Barlow
7b1e5b4f41
Fix "invalid version number" for untagged tesseract versions
...
Fixes #770
2021-04-26 01:18:07 -07:00
James R. Barlow
336d274a54
Drop remnants of support for Tesseract without has_textonly_pdf
...
Also improve Tesseract version checking so it can compare all of their
weird conventions.
2021-04-07 23:05:21 -07:00
James R. Barlow
173a80864d
Delinting
2021-04-07 02:09:45 -07:00
James R. Barlow
aa115a8be3
Remove pytest_helpers_namespace
2021-04-07 01:56:51 -07:00
James R. Barlow
a4e1f8e1f3
Merge branch 'feature/lambda'
2021-04-01 16:36:22 -07:00
James R. Barlow
079c162a96
Ensure sidecar is not input or output file
2021-03-05 00:29:42 -08:00
Dima Kuznetsov
5e2206bae7
Allow --sidecar along --pages ( #735 )
2021-02-19 16:55:35 -08:00
James R. Barlow
16bda74974
Refactor - decouple progressbar from executor
2021-01-30 20:42:00 -08:00
James R. Barlow
d274d88929
Refactor to eliminate global state in _concurrent
2021-01-30 17:36:30 -08:00
James R. Barlow
7bccb8c748
tests: fix concurrency
2021-01-24 23:46:33 -08:00
James R. Barlow
babc76fa74
tests: assert that most patched functions are called
...
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
James R. Barlow
3707af3b74
Change pdf.root to pdf.Root
2020-11-03 01:30:31 -08:00
James R. Barlow
bfe4a5b329
Tidy a log message
2020-09-25 00:17:57 -07:00
James R. Barlow
e6a7b58863
Merge branch 'de-gpl'
2020-08-12 12:20:38 -07:00
James R. Barlow
bed74501fc
Fix test breakage in validation
...
Broken in commit 4cc0dc
2020-08-05 01:35:26 -07:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
...
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
892db88f0e
test_two_languages: use narrower test
2020-06-12 14:33:02 -07:00
James R. Barlow
eeb44f78cc
Fix tests that failed on other platforms from previous fix
2020-06-12 12:59:46 -07:00
James R. Barlow
c6b9a49cbb
Fix tests that fail in CI
2020-06-10 17:08:00 -07:00
James R. Barlow
64891c2fc3
Pre-release delinting
2020-06-09 15:27:14 -07:00
James R. Barlow
fe156db41d
Merge branch 'release/v10' into trialmerge
2020-06-09 15:12:56 -07:00
James R. Barlow
0f942fb714
Rename ocrmypdf.exec -> ocrmypdf._exec
2020-06-09 14:59:09 -07:00
James R. Barlow
b109445215
Move Ghostscript rasterize_pdf to plugin
2020-06-08 17:10:27 -07:00
James R. Barlow
5e14d5b0dd
Fix test_report_file_size
...
Use more realistic test data
2020-06-03 13:24:55 -07:00
James R. Barlow
d43212d30b
Refactor --language argument into set
2020-05-25 03:20:10 -07:00
James R. Barlow
a0f9ca3a30
Move Tesseract options validation into plugin
2020-05-25 01:31:46 -07:00
James R. Barlow
9bccff4f88
Move Tesseract specific arguments to plugin
2020-05-16 03:24:31 -07:00
James R. Barlow
2bd586e093
Compare requested languages to OCR engine instead of tesseract directly
...
Also refactoring to facilitating validation needing the plugin manager.
2020-05-16 01:50:37 -07:00
James R. Barlow
2541f6cf89
Fix missing jbig2enc reported as error with -O3 instead of warning
...
Fixes #558
2020-05-12 01:05:57 -07:00