jhgarrison
5f47aac36f
Add installation instructions for Windows/Cygwin64 ( #571 )
...
Co-authored-by: Jim Garrison <bitbucket@jhmg.net>
2020-06-03 13:16:23 -07:00
James R. Barlow
c6b2fa8851
Remove unpaper spoof; no plugin needed
2020-06-02 02:42:14 -07:00
James R. Barlow
1b92f447c3
Convert tesseract_crash to plugin
2020-06-02 02:36:41 -07:00
James R. Barlow
82e7eb91d2
Tidy tesseract_noop
2020-06-02 01:50:02 -07:00
James R. Barlow
4f4ad0fb76
Convert tesseract_big_image_error to plugin
2020-06-02 01:49:47 -07:00
James R. Barlow
1d0b8641a0
Improve file size increase warning to account for changes to small files
...
Fixes #569
2020-06-02 00:35:59 -07:00
James R. Barlow
daca919775
Mark pdfminer.six 20200517 as supported
2020-06-02 00:11:02 -07:00
James R. Barlow
1598f2f0e5
Abolish spoof_tesseract_noop
2020-06-01 03:07:53 -07:00
James R. Barlow
2b23f7ec73
tesseract_noop: begin implementing with plugin
2020-06-01 02:45:49 -07:00
James R. Barlow
6528234608
Fix tesseract_ocr.py errors
2020-06-01 02:27:27 -07:00
James R. Barlow
642ebc6098
Fix test that failed on Windows
v9.8.1
2020-05-28 15:52:00 -07:00
James R. Barlow
74fdfeea3f
v9.8.1 notes
2020-05-28 15:04:23 -07:00
James R. Barlow
3754185f56
Mark pdfminer.six 20200517 as supported
2020-05-28 15:01:51 -07:00
James R. Barlow
df9f5157bd
Fix shim_paths to account for unexpected files in Program Files\gs
...
Fixes #565
2020-05-28 14:58:41 -07:00
James R. Barlow
aa060db5bc
Refactor tesseract_env variable into the plugin
...
Removed all cases except one in api.py, which isn't worth solving because
it should be removed anyway.
This also fixes a logic error in the OMP_THREAD_LIMIT decision, api.py
did not use pass kwargs correctly so they never worked before.
2020-05-26 02:14:06 -07:00
James R. Barlow
d43212d30b
Refactor --language argument into set
2020-05-25 03:20:10 -07:00
James R. Barlow
a0f9ca3a30
Move Tesseract options validation into plugin
2020-05-25 01:31:46 -07:00
James R. Barlow
0cefe886ec
Update email
2020-05-19 16:12:36 -07:00
James R. Barlow
f656c00f41
docs: Note about OCRmyPDF speed
2020-05-18 01:27:45 -07:00
James R. Barlow
03da34ee24
Test files needed!
2020-05-16 17:04:44 -07:00
James R. Barlow
9bccff4f88
Move Tesseract specific arguments to plugin
2020-05-16 03:24:31 -07:00
James R. Barlow
2bd586e093
Compare requested languages to OCR engine instead of tesseract directly
...
Also refactoring to facilitating validation needing the plugin manager.
2020-05-16 01:50:37 -07:00
James R. Barlow
9af94ac9b7
pipeline: use OCR engine abstraction instead of Tesseract
2020-05-16 01:28:56 -07:00
James R. Barlow
8174089c8b
Begin transforming Tesseract into pluggable OCR engine
2020-05-14 03:54:21 -07:00
James R. Barlow
41eb54cc0a
Standardize tesseract.generate_hocr and _pdf parameters
2020-05-14 03:23:25 -07:00
James R. Barlow
12a2f78c4d
Fix validation of languages not using tesseract_env
...
And some related issues.
2020-05-14 03:19:22 -07:00
James R. Barlow
d372f1f7fa
Remove "skip page" from tesseract interface
...
Breaks tests/test_main.py::test_tesseract_missing_tessdata because
conftest.py does not update options.tesseract_env before testing options
for some reason, and tesseract.has_textonly_pdf raises an exception
instead of returning False as the test assumes.
2020-05-12 04:09:42 -07:00
James R. Barlow
6f5b75bcd0
Remove lru_cache on get_version
...
Does not play well with forking.
2020-05-12 03:51:48 -07:00
James R. Barlow
a2d3e0b53e
Convert remaining imports to absolute
2020-05-12 02:12:08 -07:00
James R. Barlow
7f67556995
ocrmypdf.__init__: Hide _HookimplMarker
2020-05-12 01:35:45 -07:00
James R. Barlow
db8c37e58c
Refactor ocrmypdf.exec.__init__.py
2020-05-12 01:34:10 -07:00
James R. Barlow
a87c81a64f
helpers: remove unnecessary isinstance test
2020-05-12 01:28:50 -07:00
James R. Barlow
4b986a5943
cli: make ArgumentParser._api_mode private
2020-05-12 01:28:36 -07:00
James R. Barlow
2fae9b655e
Remove **kwargs from check_external_program; deprecated
2020-05-12 01:07:01 -07:00
James R. Barlow
2541f6cf89
Fix missing jbig2enc reported as error with -O3 instead of warning
...
Fixes #558
2020-05-12 01:05:57 -07:00
James R. Barlow
33b68454f3
watcher: cleanup getenv casting
2020-05-08 03:49:49 -07:00
James R. Barlow
977665d2b6
Delint some tests
2020-05-08 03:49:33 -07:00
James R. Barlow
fd7497f00d
Remove old function tesseract.v4()
2020-05-08 03:44:39 -07:00
James R. Barlow
790ff58f67
Add fix for bug in Windows Python 3.6/3.7
...
TypeError: argument of type 'WindowsPath' is not iterable
2020-05-07 22:19:21 -07:00
James R. Barlow
4b98ce391b
docs: rename security->pdfsecurity so github won't misinterpret it
2020-05-07 03:54:27 -07:00
James R. Barlow
417dbd43f6
docs: plugin documentation
2020-05-07 03:53:37 -07:00
James R. Barlow
7a12908db9
Relocate example plugin
2020-05-07 03:27:39 -07:00
James R. Barlow
9462f0a28f
graft: more refactoring
2020-05-07 02:59:24 -07:00
James R. Barlow
e760622a5c
graft: refactor
2020-05-07 02:03:42 -07:00
James R. Barlow
1b086f60a9
tesseract.py: api cleanup
2020-05-06 12:37:44 -07:00
James R. Barlow
85cbf94a6e
Convert many uses of str paths to Path
2020-05-06 02:53:47 -07:00
James R. Barlow
6f4286e1b1
New hook: filter_page_image
2020-05-06 02:24:07 -07:00
James R. Barlow
39888ae8c9
Rename install_cli to add_options
2020-05-06 01:10:09 -07:00
James R. Barlow
dd361ecd05
Support importing plugin by filename
2020-05-06 00:44:40 -07:00
James R. Barlow
32759c9025
Change argument from --plugins to --plugin
2020-05-06 00:43:40 -07:00