21 Commits

Author SHA1 Message Date
James R. Barlow
746969207a Remove deprecated --pdf-renderer tess4, which was renamed to sandwich
Should have been cut in v6.0.0
2018-03-26 01:17:22 -07:00
James R. Barlow
8975b72a01 Fix test_testonly_pdf generating an output file in pwd 2018-03-24 22:34:35 -07:00
James R. Barlow
11d74dea09 Remove the OCRMYPDF_program environment variables
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:07:02 -07:00
James R. Barlow
6756016572 Add license notice to all files
Source files to GPL3

Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py

Test resources to CC BY-SA 4.0 except when otherwise noted.

Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
45c7bd9a60 lint: Remove shebangs from non-executable files 2018-02-24 12:38:58 -08:00
James R. Barlow
2c24f67deb Rename “tess4” renderer to “sandwich” and make it default in Tess 3.05.01
Tesseract 3.05.01 backported the textonly_pdf=1 which allows the use
of this superior PDF renderer prior to 4.00 alpha. This means that
the tess4 name is no longer accurate, so call it a sandwich because of
its merge-preserve characteristic. Preserve the tess4 name. Fix the
documentation and tests to reflect this.

Make it the default, because it’s better. It does not have the issues
the “tesseract” renderer does prior to Tess 3.05.00 with rendering
PDFs that Ghostscript corrupts, and it produces better output without
re-rastering.

Deprecate some old stuff to avoid the test suite growing obscenely
large.
2017-06-13 13:09:12 -07:00
James R. Barlow
47298be132 Remove Python <3.5 test 2017-06-13 10:14:28 -07:00
James R. Barlow
3d2f6f0772 Fix tess4 test using old-style pageinfo API 2017-05-29 13:51:21 -07:00
James R. Barlow
08e47117a3 Rename pageinfo to pdfinfo 2017-05-19 15:48:23 -07:00
James R. Barlow
56d2aae963 Refactor from ImageInfo index to attribute accessing 2017-05-18 18:39:14 -07:00
James R. Barlow
cd04ae6949 Refactor PdfInfo(str(filename)) -> PdfInfo(filename) 2017-05-18 16:43:50 -07:00
James R. Barlow
b0e95842b8 Fix Travis CI errors while looking around for Tess4 2017-05-12 00:40:00 -07:00
James R. Barlow
183eafa587 Implement sidecar text files (#126) 2017-05-10 15:22:44 -07:00
James R. Barlow
7b7e3a3e03 Enable lossless reconstruction for —pdf-renderer tess4 where appropriate 2017-03-29 23:44:12 -07:00
James R. Barlow
1e7fbd4202 Fix issues with —pdf-renderer tess4 page skipping
If tess4 renderer needed to skip OCR on a page it would end up
duplicating the page contents onto the new page, rather than creating
a blank OCR layer and placing it on the output page. This created
duplicated content in output files.
2017-03-29 23:43:26 -07:00
James R. Barlow
8ddbe81513 Fix issue #147: unpaper loses DPI information, affects —pdf-renderer tess4 2017-03-24 13:23:03 -07:00
James R. Barlow
a0657ad937 Prevent use of —pdf-renderer tess4 on tesseract 3 2017-02-06 13:49:43 -08:00
James R. Barlow
9a15a4db10 Ensure specified destination is writable before starting pipeline process 2017-01-26 22:08:24 -08:00
James R. Barlow
02fba02d31 Refactor test suite to use fixtures to manage paths 2017-01-26 16:38:59 -08:00
James R. Barlow
fb9e7c82f6 Move duplicate test code into common namespace 2017-01-26 13:36:52 -08:00
James R. Barlow
bad67c6dc5 Rename ‘tesstop’ to ‘tess4’
There’s no reason text-only PDF shouldn’t become the default for
tesseract 4.
2017-01-26 12:28:51 -08:00