12 Commits

Author SHA1 Message Date
James R. Barlow
f91faf9795 Add new argument --tesseract-thresholding to control tesseract thresholding where available
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
acc9d58c39 Skip no language test for Tess 5 2021-11-13 01:37:27 -08:00
James R. Barlow
173a80864d
Delinting 2021-04-07 02:09:45 -07:00
James R. Barlow
aa115a8be3
Remove pytest_helpers_namespace 2021-04-07 01:56:51 -07:00
James R. Barlow
b267494e4a
Create raster PDF pages to match input page size
Previously we produced a raster image, then multiplied image width
by DPI to get the page size. However if there is rounding the
page size may not match exactly. In this modified approach we
constrain the page size to match.
2021-01-08 15:10:43 -08:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
0f942fb714 Rename ocrmypdf.exec -> ocrmypdf._exec 2020-06-09 14:59:09 -07:00
James R. Barlow
3b6f6782f0
Remove tesseract_env, --tesseract-env 2020-06-09 00:39:53 -07:00
James R. Barlow
a9a473f2e5 Convert all tesseract cache usages to plugin 2020-06-05 17:55:18 -07:00
James R. Barlow
41eb54cc0a
Standardize tesseract.generate_hocr and _pdf parameters 2020-05-14 03:23:25 -07:00
James R. Barlow
d372f1f7fa Remove "skip page" from tesseract interface
Breaks tests/test_main.py::test_tesseract_missing_tessdata because
conftest.py does not update options.tesseract_env before testing options
for some reason, and tesseract.has_textonly_pdf raises an exception
instead of returning False as the test assumes.
2020-05-12 04:09:42 -07:00
James R. Barlow
fd7497f00d
Remove old function tesseract.v4() 2020-05-08 03:44:39 -07:00