2676 Commits

Author SHA1 Message Date
James R. Barlow
393c5a9ea4 Fix error on -l lang1+lang2 2020-06-12 12:10:29 -07:00
James R. Barlow
c6b9a49cbb
Fix tests that fail in CI v10.0.0 2020-06-10 17:08:00 -07:00
James R. Barlow
17a4831745
v10 release notes and dependencies 2020-06-10 14:27:47 -07:00
James R. Barlow
7caf1e85ff
info: change "Scan" message 2020-06-10 12:11:37 -07:00
James R. Barlow
f59a757e8b
info: tidy handling of content streams 2020-06-10 12:09:24 -07:00
James R. Barlow
872bafad4b Reinstate quick test for text/no text
Partial revert of commit 991db17
2020-06-10 12:00:52 -07:00
James R. Barlow
8599400445
Only do page analysis on pages we will do OCR on 2020-06-10 11:33:27 -07:00
James R. Barlow
b6eebadf05
Use pikepdf.open with block to manage PdfInfo 2020-06-10 11:32:46 -07:00
James R. Barlow
a4e88eb8f0
Simplify plugin_manager pickling 2020-06-10 00:41:19 -07:00
James R. Barlow
f6257c2183
subprocess: lru_cache version checks 2020-06-10 00:32:06 -07:00
James R. Barlow
64891c2fc3
Pre-release delinting 2020-06-09 15:27:14 -07:00
James R. Barlow
fe156db41d Merge branch 'release/v10' into trialmerge 2020-06-09 15:12:56 -07:00
James R. Barlow
0f942fb714 Rename ocrmypdf.exec -> ocrmypdf._exec 2020-06-09 14:59:09 -07:00
James R. Barlow
be8ca589d4
Move ocrmypdf.exec.run and friends to ocrmypdf.subprocess 2020-06-09 14:53:10 -07:00
James R. Barlow
3b6f6782f0
Remove tesseract_env, --tesseract-env 2020-06-09 00:39:53 -07:00
James R. Barlow
21c0e045cb
Remove _OCRMYPDF_TEST_PATH environment variable 2020-06-09 00:30:13 -07:00
James R. Barlow
ebbf68bd08
The big payoff: abolishing spoofing machinery 2020-06-09 00:08:20 -07:00
James R. Barlow
2059e916da
Convert all ghostscript spoofs to test plugins 2020-06-09 00:00:25 -07:00
James R. Barlow
c22f245606
Plugins must return not-None if they intend to stop builtin 2020-06-08 23:48:45 -07:00
James R. Barlow
7b9025f397
Convert generate_pdfa to plugin 2020-06-08 22:28:38 -07:00
James R. Barlow
b109445215
Move Ghostscript rasterize_pdf to plugin 2020-06-08 17:10:27 -07:00
James R. Barlow
fd1cd8e50a
docs: explain --rotate-pages-threshold 2020-06-08 07:46:55 -07:00
James R. Barlow
c6c70c2171
docs: Ubuntu 20.04 install instructions 2020-06-08 07:42:13 -07:00
James R. Barlow
a9a473f2e5 Convert all tesseract cache usages to plugin 2020-06-05 17:55:18 -07:00
James R. Barlow
6268e2faff
Begin replacing tests/spoof/tesseract_cache with plugin 2020-06-05 17:27:10 -07:00
James R. Barlow
ec3f506500 Convert tesseract_badutf8 to plugin 2020-06-05 16:38:19 -07:00
James R. Barlow
00daa51a73
v9.8.2 release notes v9.8.2 2020-06-03 13:28:35 -07:00
James R. Barlow
e60f4d3f43
docs: tidy Cygwin install 2020-06-03 13:27:05 -07:00
James R. Barlow
7460745f80 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-06-03 13:25:12 -07:00
James R. Barlow
5e14d5b0dd
Fix test_report_file_size
Use more realistic test data
2020-06-03 13:24:55 -07:00
James R. Barlow
d118132fa6
layout: look for text in XObjects too 2020-06-03 13:16:55 -07:00
jhgarrison
5f47aac36f
Add installation instructions for Windows/Cygwin64 (#571)
Co-authored-by: Jim Garrison <bitbucket@jhmg.net>
2020-06-03 13:16:23 -07:00
James R. Barlow
c6b2fa8851
Remove unpaper spoof; no plugin needed 2020-06-02 02:42:14 -07:00
James R. Barlow
1b92f447c3
Convert tesseract_crash to plugin 2020-06-02 02:36:41 -07:00
James R. Barlow
82e7eb91d2
Tidy tesseract_noop 2020-06-02 01:50:02 -07:00
James R. Barlow
4f4ad0fb76
Convert tesseract_big_image_error to plugin 2020-06-02 01:49:47 -07:00
James R. Barlow
1d0b8641a0 Improve file size increase warning to account for changes to small files
Fixes #569
2020-06-02 00:35:59 -07:00
James R. Barlow
daca919775
Mark pdfminer.six 20200517 as supported 2020-06-02 00:11:02 -07:00
James R. Barlow
1598f2f0e5 Abolish spoof_tesseract_noop 2020-06-01 03:07:53 -07:00
James R. Barlow
2b23f7ec73
tesseract_noop: begin implementing with plugin 2020-06-01 02:45:49 -07:00
James R. Barlow
6528234608
Fix tesseract_ocr.py errors 2020-06-01 02:27:27 -07:00
James R. Barlow
642ebc6098
Fix test that failed on Windows v9.8.1 2020-05-28 15:52:00 -07:00
James R. Barlow
74fdfeea3f
v9.8.1 notes 2020-05-28 15:04:23 -07:00
James R. Barlow
3754185f56
Mark pdfminer.six 20200517 as supported 2020-05-28 15:01:51 -07:00
James R. Barlow
df9f5157bd
Fix shim_paths to account for unexpected files in Program Files\gs
Fixes #565
2020-05-28 14:58:41 -07:00
James R. Barlow
aa060db5bc Refactor tesseract_env variable into the plugin
Removed all cases except one in api.py, which isn't worth solving because
it should be removed anyway.

This also fixes a logic error in the OMP_THREAD_LIMIT decision, api.py
did not use pass kwargs correctly so they never worked before.
2020-05-26 02:14:06 -07:00
James R. Barlow
d43212d30b
Refactor --language argument into set 2020-05-25 03:20:10 -07:00
James R. Barlow
a0f9ca3a30
Move Tesseract options validation into plugin 2020-05-25 01:31:46 -07:00
James R. Barlow
0cefe886ec
Update email 2020-05-19 16:12:36 -07:00
James R. Barlow
f656c00f41
docs: Note about OCRmyPDF speed 2020-05-18 01:27:45 -07:00