68 Commits

Author SHA1 Message Date
James R. Barlow
c591f9601a
Remove Latin hOCR test 2023-11-19 23:51:27 -08:00
James R. Barlow
95b14ee282
Refactor lossless reconstruction setter into separate function
Still messy but good enough as a start.
2023-10-24 00:52:31 -07:00
James R. Barlow
ea36aedb5f
Overhaul version checkers to prefer Version to str 2023-09-25 00:59:44 -07:00
James R. Barlow
5124daa79f
Fix test failures from preceding 2023-06-19 23:25:31 -07:00
James R. Barlow
b7eb93eb79
Adopt ruff and fix prelim lints 2023-04-14 00:19:17 -07:00
James R. Barlow
46d0978a09
Update version scripts to support Ghostscript 10.0 2022-10-03 21:59:31 -07:00
James R. Barlow
c2ccc7f29d Fix test failure due to new logging from pikepdf 2022-09-21 01:00:08 -07:00
James R. Barlow
acc70036cc
Set minimum Tesseract to 4.1.1 2022-08-02 15:20:29 -07:00
James R. Barlow
67773da309
Drop support for Ghostscript <9.50 2022-08-02 15:01:10 -07:00
James R. Barlow
5fe3102e4e
tests: new test to confirm correct printing of tesseract install advice 2022-08-01 12:31:37 -07:00
James R. Barlow
5b57520c98
tests: simplify some validation tests 2022-08-01 12:31:05 -07:00
James R. Barlow
30e4198f3a
tests: fix test_validation when chi_sim not installed 2022-08-01 02:47:39 -07:00
James R. Barlow
ba372e5841
Reorganize validation to fix exception when Tesseract not installed
The existing logic would call an OCR plugin's get_languages function before
allowing the plugin to check if its dependencies were available. This caused
an exception if Tesseract was installed, when we were supposed to issue
an error message advising the user to install Tesseract.
2022-08-01 02:04:09 -07:00
James R. Barlow
80ed2117cc
Change to SPDX license tracking 2022-07-28 01:10:07 -07:00
James R. Barlow
dc6f1a266a
Modernize type annotations 2022-07-23 00:39:24 -07:00
James R. Barlow
17a5b8b43c
Refactor reporting of optimization failures 2022-06-13 01:30:15 -07:00
James R. Barlow
61069660a2
Move optimization options to plugin 2022-06-12 02:42:16 -07:00
James R. Barlow
f91faf9795 Add new argument --tesseract-thresholding to control tesseract thresholding where available
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
6c34d59836 tesseract: yet another version variant 2021-11-04 00:17:18 -07:00
James R. Barlow
790d3022f6 Implement --output-type=none to skip producing the PDF and use only the sidecar
Closes #787
2021-09-26 01:07:34 -07:00
James R. Barlow
5f01c5e330 Fix another species of Tesseract version number breaking regex
Fixes #795
2021-06-16 00:09:03 -07:00
James R. Barlow
7b1e5b4f41
Fix "invalid version number" for untagged tesseract versions
Fixes #770
2021-04-26 01:18:07 -07:00
James R. Barlow
336d274a54 Drop remnants of support for Tesseract without has_textonly_pdf
Also improve Tesseract version checking so it can compare all of their
weird conventions.
2021-04-07 23:05:21 -07:00
James R. Barlow
173a80864d
Delinting 2021-04-07 02:09:45 -07:00
James R. Barlow
aa115a8be3
Remove pytest_helpers_namespace 2021-04-07 01:56:51 -07:00
James R. Barlow
a4e1f8e1f3 Merge branch 'feature/lambda' 2021-04-01 16:36:22 -07:00
James R. Barlow
079c162a96 Ensure sidecar is not input or output file 2021-03-05 00:29:42 -08:00
Dima Kuznetsov
5e2206bae7
Allow --sidecar along --pages (#735) 2021-02-19 16:55:35 -08:00
James R. Barlow
16bda74974
Refactor - decouple progressbar from executor 2021-01-30 20:42:00 -08:00
James R. Barlow
d274d88929
Refactor to eliminate global state in _concurrent 2021-01-30 17:36:30 -08:00
James R. Barlow
7bccb8c748 tests: fix concurrency 2021-01-24 23:46:33 -08:00
James R. Barlow
babc76fa74 tests: assert that most patched functions are called
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
James R. Barlow
3707af3b74
Change pdf.root to pdf.Root 2020-11-03 01:30:31 -08:00
James R. Barlow
bfe4a5b329 Tidy a log message 2020-09-25 00:17:57 -07:00
James R. Barlow
e6a7b58863 Merge branch 'de-gpl' 2020-08-12 12:20:38 -07:00
James R. Barlow
bed74501fc
Fix test breakage in validation
Broken in commit 4cc0dc
2020-08-05 01:35:26 -07:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
892db88f0e
test_two_languages: use narrower test 2020-06-12 14:33:02 -07:00
James R. Barlow
eeb44f78cc
Fix tests that failed on other platforms from previous fix 2020-06-12 12:59:46 -07:00
James R. Barlow
c6b9a49cbb
Fix tests that fail in CI 2020-06-10 17:08:00 -07:00
James R. Barlow
64891c2fc3
Pre-release delinting 2020-06-09 15:27:14 -07:00
James R. Barlow
fe156db41d Merge branch 'release/v10' into trialmerge 2020-06-09 15:12:56 -07:00
James R. Barlow
0f942fb714 Rename ocrmypdf.exec -> ocrmypdf._exec 2020-06-09 14:59:09 -07:00
James R. Barlow
b109445215
Move Ghostscript rasterize_pdf to plugin 2020-06-08 17:10:27 -07:00
James R. Barlow
5e14d5b0dd
Fix test_report_file_size
Use more realistic test data
2020-06-03 13:24:55 -07:00
James R. Barlow
d43212d30b
Refactor --language argument into set 2020-05-25 03:20:10 -07:00
James R. Barlow
a0f9ca3a30
Move Tesseract options validation into plugin 2020-05-25 01:31:46 -07:00
James R. Barlow
9bccff4f88
Move Tesseract specific arguments to plugin 2020-05-16 03:24:31 -07:00
James R. Barlow
2bd586e093
Compare requested languages to OCR engine instead of tesseract directly
Also refactoring to facilitating validation needing the plugin manager.
2020-05-16 01:50:37 -07:00
James R. Barlow
2541f6cf89
Fix missing jbig2enc reported as error with -O3 instead of warning
Fixes #558
2020-05-12 01:05:57 -07:00