James R. Barlow
7cd2770a13
Fix issue #137 - proportions of non-square resolution distorted
...
Distortion mainly affected —force-ocr
v4.5.1
2017-02-26 17:13:16 -08:00
James R. Barlow
7b94129d9e
v4.5 notes
v4.5
2017-02-14 13:03:48 -08:00
James R. Barlow
d1a0065ef8
Create test case for Form XObjects
2017-02-14 12:51:15 -08:00
James R. Barlow
5a817370fd
Warn more strongly about —pdf-renderer tesseract until fix is widely propagated
2017-02-14 11:33:07 -08:00
James R. Barlow
ab0a210763
Update dockerfile.tess4 yet again
...
Installing Tess4 PPA over Tess3 proved too much pain, so sever the link
between this and the jbarlow83/ocrmypdf image, starting each from
scratch. Also the complete set of language packs proves too much - the
build seems likely to fail when trying to install so many.
2017-02-14 11:32:05 -08:00
James R. Barlow
9f800736bc
Fix running_in_docker() check failing on newer Docker
...
This test has to work to ensure spoof/tesseract_cache.py has a writable
directory to put cache into. Otherwise those tests fail.
2017-02-13 02:16:06 -08:00
James R. Barlow
c9a83afad6
Improve batch processing examples
2017-02-13 02:14:32 -08:00
James R. Barlow
5e14274f10
pageinfo: learn to extract image information from Form XObjects
2017-02-11 16:48:59 -08:00
James R. Barlow
167470b4bd
Re-fix Dockerfile.tess4
...
[ci skip]
2017-02-10 15:43:49 -08:00
James R. Barlow
f06d3c2ec2
Fix tesseract 3.04 on tesseract 4 on image
...
[skip ci]
2017-02-10 08:38:22 -08:00
James R. Barlow
74c99a8a77
v4.4.2 release notes
v4.4.2
2017-02-06 21:56:55 -08:00
James R. Barlow
0e4d312ee2
Adjust Travis deploy to PyPI settings
...
-only on master branch
-only Python 3.6 build uploads, so the others don’t compete
-don’t upload docs to PyPI
2017-02-06 21:27:59 -08:00
James R. Barlow
589f19559d
Rewrite Dockerfiles to use ubuntu 16.10 base system
...
Debian now has a few disadvantages:
-there is no convenient PPA for Debian tesseract 4.0, but there is for
Ubuntu
-Ubuntu sets locale to UTF-8 automatically removing the need to do this
All three ocrmypdf docker images are now based on a common Ubuntu
16.10 image, derived from the one used to build ocrmypdf-tess4.
-polyglot now differs from -tess4 only by opting into the tess4 PPA.
Both Ubuntu 16.10 and Debian stretch use tesseract 3.04.01 now making
the sharp.ttf patch unnecessary. /etc/apt/sources has been unused for a
while now both have newer Ghostscripts.
2017-02-06 14:39:29 -08:00
James R. Barlow
f28bc25dc0
Configure travis to handle deployment to PyPI; also lint .travis.yml
2017-02-06 13:50:53 -08:00
James R. Barlow
a0657ad937
Prevent use of —pdf-renderer tess4 on tesseract 3
2017-02-06 13:49:43 -08:00
James R. Barlow
5b8d88af4c
Suggest use of aliases to hide docker run
2017-01-30 15:08:02 -08:00
James R. Barlow
fa82b50340
Adding missing file Dockerfile.tess4
2017-01-29 18:34:01 -08:00
James R. Barlow
005216bc57
Support ocrmypdf-tess4
2017-01-29 18:26:52 -08:00
James R. Barlow
e748fdcf6f
v4.4.1 release notes
v4.4.1
2017-01-28 22:23:35 -08:00
James R. Barlow
8c17c9918e
Add documentation and test cases for —tesseract-config
...
This parameter has existed for along time but never really got any
attention.
2017-01-28 22:06:51 -08:00
James R. Barlow
ea0dd99d0b
More documentation updates
2017-01-28 15:35:59 -08:00
James R. Barlow
e0cc67afae
docs: suggest —oem 1
2017-01-28 14:58:25 -08:00
James R. Barlow
04f9cbe364
Describe how to use tesseract 4.0 while 3.04 is installed
2017-01-27 18:13:59 -08:00
James R. Barlow
99afebd033
tesseract jobs_limit(2)
...
At least on macOS with my quadcore performance improves with two
tesseracts in parallel (20% gain). Hard to say how this will affect
Linux, but stepping up to 2 jobs seems justifiable.
2017-01-27 18:13:12 -08:00
James R. Barlow
a6feacc810
travis: fix ‘pip install’ by moving working code out of the way
2017-01-27 14:33:23 -08:00
James R. Barlow
65e4b1672f
cffi: verbose=True
2017-01-27 14:17:13 -08:00
James R. Barlow
46cc0dd190
Revert "Do we need to exclude ocrmypdf.lib?"
...
This reverts commit 678b9fb603e2ce1bc12a34e14a715dcce5fc4a9c.
2017-01-27 13:51:30 -08:00
James R. Barlow
678b9fb603
Do we need to exclude ocrmypdf.lib?
2017-01-27 13:49:11 -08:00
James R. Barlow
49ab0c1f0b
setup.py: cffi is definitely needed in setup_requires
2017-01-27 13:43:47 -08:00
James R. Barlow
ab490a7736
Experiment: update *requirements.txt, use more current travis build steps
...
Perhaps this works around the pip/setup.py asymmetry that broke the
4.4 release.
2017-01-27 13:13:14 -08:00
James R. Barlow
e4ce1dae35
setup.py: for some reason, subpackages must be explicitly specified
v4.4.post1
2017-01-27 00:37:05 -08:00
James R. Barlow
179b812acb
Fix readthedocs build error
2017-01-26 23:57:51 -08:00
jbarlow83
7f170517ec
Note about pytest-helpers-namespace
2017-01-26 23:15:32 -08:00
James R. Barlow
5480da4f04
Additional docs updates for v4.4
v4.4
2017-01-26 23:02:44 -08:00
James R. Barlow
9a15a4db10
Ensure specified destination is writable before starting pipeline process
2017-01-26 22:08:24 -08:00
James R. Barlow
55aeaec293
Autorotation check: Replace duplicated tests with parameterized test
2017-01-26 18:07:59 -08:00
James R. Barlow
f6df1fb40c
Fix test suite regression: output files dumped in tests/resources
2017-01-26 18:07:09 -08:00
James R. Barlow
b889a89c36
Fix remaining 3.4/3.5 regressions
2017-01-26 17:53:27 -08:00
James R. Barlow
1976dc6f30
Fix issue #121 “pop from empty list” (content stream parsing error)
2017-01-26 17:24:40 -08:00
James R. Barlow
e864c65d26
(Hopefully) Fix Path <-> py.path conversion on Py3.4/3.5
2017-01-26 17:19:15 -08:00
James R. Barlow
02fba02d31
Refactor test suite to use fixtures to manage paths
2017-01-26 16:38:59 -08:00
James R. Barlow
fb9e7c82f6
Move duplicate test code into common namespace
2017-01-26 13:36:52 -08:00
James R. Barlow
77d31bf646
Add renderers page (missed from previous)
2017-01-26 13:20:44 -08:00
James R. Barlow
29ca799bcf
Move pytest.ini into setup.cfg
2017-01-26 12:45:38 -08:00
James R. Barlow
467b7f0163
Update docs for eventual v4.4 release
2017-01-26 12:29:11 -08:00
James R. Barlow
bad67c6dc5
Rename ‘tesstop’ to ‘tess4’
...
There’s no reason text-only PDF shouldn’t become the default for
tesseract 4.
2017-01-26 12:28:51 -08:00
James R. Barlow
ac40426971
Implement “tesstop” (tesseract v4 text-only pages - working name)
2017-01-20 17:16:01 -08:00
James R. Barlow
7acfaf6d34
pipeline: rename some of the stages, for clarity
2017-01-20 17:15:00 -08:00
James R. Barlow
99e47c9c04
tesseract: add support for using v4 textonly_pdf feature
2017-01-20 17:06:23 -08:00
James R. Barlow
d7904e2251
Travis now has Python 3.6, test against it
2017-01-20 14:26:17 -08:00