2676 Commits

Author SHA1 Message Date
James R. Barlow
7cd2770a13 Fix issue #137 - proportions of non-square resolution distorted
Distortion mainly affected —force-ocr
v4.5.1
2017-02-26 17:13:16 -08:00
James R. Barlow
7b94129d9e v4.5 notes v4.5 2017-02-14 13:03:48 -08:00
James R. Barlow
d1a0065ef8 Create test case for Form XObjects 2017-02-14 12:51:15 -08:00
James R. Barlow
5a817370fd Warn more strongly about —pdf-renderer tesseract until fix is widely propagated 2017-02-14 11:33:07 -08:00
James R. Barlow
ab0a210763 Update dockerfile.tess4 yet again
Installing Tess4 PPA over Tess3 proved too much pain, so sever the link
between this and the jbarlow83/ocrmypdf image, starting each from
scratch. Also the complete set of language packs proves too much - the
build seems likely to fail when trying to install so many.
2017-02-14 11:32:05 -08:00
James R. Barlow
9f800736bc Fix running_in_docker() check failing on newer Docker
This test has to work to ensure spoof/tesseract_cache.py has a writable
directory to put cache into. Otherwise those tests fail.
2017-02-13 02:16:06 -08:00
James R. Barlow
c9a83afad6 Improve batch processing examples 2017-02-13 02:14:32 -08:00
James R. Barlow
5e14274f10 pageinfo: learn to extract image information from Form XObjects 2017-02-11 16:48:59 -08:00
James R. Barlow
167470b4bd Re-fix Dockerfile.tess4
[ci skip]
2017-02-10 15:43:49 -08:00
James R. Barlow
f06d3c2ec2 Fix tesseract 3.04 on tesseract 4 on image
[skip ci]
2017-02-10 08:38:22 -08:00
James R. Barlow
74c99a8a77 v4.4.2 release notes v4.4.2 2017-02-06 21:56:55 -08:00
James R. Barlow
0e4d312ee2 Adjust Travis deploy to PyPI settings
-only on master branch
-only Python 3.6 build uploads, so the others don’t compete
-don’t upload docs to PyPI
2017-02-06 21:27:59 -08:00
James R. Barlow
589f19559d Rewrite Dockerfiles to use ubuntu 16.10 base system
Debian now has a few disadvantages:
-there is no convenient PPA for Debian tesseract 4.0, but there is for
Ubuntu
-Ubuntu sets locale to UTF-8 automatically removing the need to do this

All three ocrmypdf docker images are now based on a common Ubuntu
16.10 image, derived from the one used to build ocrmypdf-tess4.
-polyglot now differs from -tess4 only by opting into the tess4 PPA.

Both Ubuntu 16.10 and Debian stretch use tesseract 3.04.01 now making
the sharp.ttf patch unnecessary. /etc/apt/sources has been unused for a
while now both have newer Ghostscripts.
2017-02-06 14:39:29 -08:00
James R. Barlow
f28bc25dc0 Configure travis to handle deployment to PyPI; also lint .travis.yml 2017-02-06 13:50:53 -08:00
James R. Barlow
a0657ad937 Prevent use of —pdf-renderer tess4 on tesseract 3 2017-02-06 13:49:43 -08:00
James R. Barlow
5b8d88af4c Suggest use of aliases to hide docker run 2017-01-30 15:08:02 -08:00
James R. Barlow
fa82b50340 Adding missing file Dockerfile.tess4 2017-01-29 18:34:01 -08:00
James R. Barlow
005216bc57 Support ocrmypdf-tess4 2017-01-29 18:26:52 -08:00
James R. Barlow
e748fdcf6f v4.4.1 release notes v4.4.1 2017-01-28 22:23:35 -08:00
James R. Barlow
8c17c9918e Add documentation and test cases for —tesseract-config
This parameter has existed for along time but never really got any
attention.
2017-01-28 22:06:51 -08:00
James R. Barlow
ea0dd99d0b More documentation updates 2017-01-28 15:35:59 -08:00
James R. Barlow
e0cc67afae docs: suggest —oem 1 2017-01-28 14:58:25 -08:00
James R. Barlow
04f9cbe364 Describe how to use tesseract 4.0 while 3.04 is installed 2017-01-27 18:13:59 -08:00
James R. Barlow
99afebd033 tesseract jobs_limit(2)
At least on macOS with my quadcore performance improves with two
tesseracts in parallel (20% gain). Hard to say how this will affect
Linux, but stepping up to 2 jobs seems justifiable.
2017-01-27 18:13:12 -08:00
James R. Barlow
a6feacc810 travis: fix ‘pip install’ by moving working code out of the way 2017-01-27 14:33:23 -08:00
James R. Barlow
65e4b1672f cffi: verbose=True 2017-01-27 14:17:13 -08:00
James R. Barlow
46cc0dd190 Revert "Do we need to exclude ocrmypdf.lib?"
This reverts commit 678b9fb603e2ce1bc12a34e14a715dcce5fc4a9c.
2017-01-27 13:51:30 -08:00
James R. Barlow
678b9fb603 Do we need to exclude ocrmypdf.lib? 2017-01-27 13:49:11 -08:00
James R. Barlow
49ab0c1f0b setup.py: cffi is definitely needed in setup_requires 2017-01-27 13:43:47 -08:00
James R. Barlow
ab490a7736 Experiment: update *requirements.txt, use more current travis build steps
Perhaps this works around the pip/setup.py asymmetry that broke the
4.4 release.
2017-01-27 13:13:14 -08:00
James R. Barlow
e4ce1dae35 setup.py: for some reason, subpackages must be explicitly specified v4.4.post1 2017-01-27 00:37:05 -08:00
James R. Barlow
179b812acb Fix readthedocs build error 2017-01-26 23:57:51 -08:00
jbarlow83
7f170517ec Note about pytest-helpers-namespace 2017-01-26 23:15:32 -08:00
James R. Barlow
5480da4f04 Additional docs updates for v4.4 v4.4 2017-01-26 23:02:44 -08:00
James R. Barlow
9a15a4db10 Ensure specified destination is writable before starting pipeline process 2017-01-26 22:08:24 -08:00
James R. Barlow
55aeaec293 Autorotation check: Replace duplicated tests with parameterized test 2017-01-26 18:07:59 -08:00
James R. Barlow
f6df1fb40c Fix test suite regression: output files dumped in tests/resources 2017-01-26 18:07:09 -08:00
James R. Barlow
b889a89c36 Fix remaining 3.4/3.5 regressions 2017-01-26 17:53:27 -08:00
James R. Barlow
1976dc6f30 Fix issue #121 “pop from empty list” (content stream parsing error) 2017-01-26 17:24:40 -08:00
James R. Barlow
e864c65d26 (Hopefully) Fix Path <-> py.path conversion on Py3.4/3.5 2017-01-26 17:19:15 -08:00
James R. Barlow
02fba02d31 Refactor test suite to use fixtures to manage paths 2017-01-26 16:38:59 -08:00
James R. Barlow
fb9e7c82f6 Move duplicate test code into common namespace 2017-01-26 13:36:52 -08:00
James R. Barlow
77d31bf646 Add renderers page (missed from previous) 2017-01-26 13:20:44 -08:00
James R. Barlow
29ca799bcf Move pytest.ini into setup.cfg 2017-01-26 12:45:38 -08:00
James R. Barlow
467b7f0163 Update docs for eventual v4.4 release 2017-01-26 12:29:11 -08:00
James R. Barlow
bad67c6dc5 Rename ‘tesstop’ to ‘tess4’
There’s no reason text-only PDF shouldn’t become the default for
tesseract 4.
2017-01-26 12:28:51 -08:00
James R. Barlow
ac40426971 Implement “tesstop” (tesseract v4 text-only pages - working name) 2017-01-20 17:16:01 -08:00
James R. Barlow
7acfaf6d34 pipeline: rename some of the stages, for clarity 2017-01-20 17:15:00 -08:00
James R. Barlow
99e47c9c04 tesseract: add support for using v4 textonly_pdf feature 2017-01-20 17:06:23 -08:00
James R. Barlow
d7904e2251 Travis now has Python 3.6, test against it 2017-01-20 14:26:17 -08:00