2676 Commits

Author SHA1 Message Date
James R. Barlow
cd1a99a0de Refactor int(os.path.basename(s)[0:6]) -> page_number(s) 2017-06-26 13:29:40 -07:00
James R. Barlow
48e3b267fc Accept PDFs with whitespace ahead of %PDF marker
Noticed in @aagahi 's fork
2017-06-26 13:17:47 -07:00
James R. Barlow
3a7c3417bb Don’t check tags and branch at the same time as Travis doesn’t get this
Travis is weird
v5.2
2017-06-13 13:14:34 -07:00
James R. Barlow
d792ef7222 Give the ‘auto’ renderer setting more test covfefe 2017-06-13 13:13:58 -07:00
James R. Barlow
2c24f67deb Rename “tess4” renderer to “sandwich” and make it default in Tess 3.05.01
Tesseract 3.05.01 backported the textonly_pdf=1 which allows the use
of this superior PDF renderer prior to 4.00 alpha. This means that
the tess4 name is no longer accurate, so call it a sandwich because of
its merge-preserve characteristic. Preserve the tess4 name. Fix the
documentation and tests to reflect this.

Make it the default, because it’s better. It does not have the issues
the “tesseract” renderer does prior to Tess 3.05.00 with rendering
PDFs that Ghostscript corrupts, and it produces better output without
re-rastering.

Deprecate some old stuff to avoid the test suite growing obscenely
large.
2017-06-13 13:09:12 -07:00
James R. Barlow
9e75e28d0c Homebrew needs x11 to compile Pillow 2017-06-13 11:03:26 -07:00
James R. Barlow
3232643809 Support “textonly PDF” renderer in Tesseract 3.05.01 2017-06-13 10:18:08 -07:00
James R. Barlow
f7ee9e90ce Document what is meant by the ocrmypdf “API” 2017-06-13 10:15:11 -07:00
James R. Barlow
47298be132 Remove Python <3.5 test 2017-06-13 10:14:28 -07:00
James R. Barlow
a88fa83515 Travis: fix deploy conditions for homebrew autobrew 2017-05-31 02:29:32 -07:00
James R. Barlow
12bfe20385 v5.1 release notes v5.1 2017-05-29 14:36:50 -07:00
James R. Barlow
3d2f6f0772 Fix tess4 test using old-style pageinfo API 2017-05-29 13:51:21 -07:00
James R. Barlow
1cb607f64b Merge UserUnit 2017-05-29 13:22:55 -07:00
James R. Barlow
d3c54fbbde For —rotate-pages, rasterize preview at half DPI instead of 200 DPI
Ensures that time is not wasted on previews at higher resolution than
the input as was sometimes the case
2017-05-29 13:01:18 -07:00
James R. Barlow
28341b755f Refactor common test fixtures 2017-05-29 12:47:55 -07:00
James R. Barlow
4b5cd420e1 Add new test file 2017-05-29 12:16:08 -07:00
James R. Barlow
1d57bcc99e Fix Ghostscript rasterizing of UserUnit pages and related sizing issues 2017-05-29 12:14:10 -07:00
James R. Barlow
facdd13879 Ghostscript: refactor image output resizing 2017-05-29 11:42:27 -07:00
James R. Barlow
6e891f91d3 ghostscript, qpdf: Restore API backward compatibility 2017-05-29 11:13:06 -07:00
James R. Barlow
9b50ede977 Partially solve ghostscript rasterize_pdf producing wrong file size
Kludge. Assumes JPEG for now. Messy.
2017-05-25 01:17:43 -07:00
James R. Barlow
82cf010333 Error out if trying to produce PDF/A >200” due to Ghostscript limitation 2017-05-25 00:07:29 -07:00
James R. Barlow
6ff6c8614f —output-type=pdf now outputs /UserUnit PDFs at the correct size
This currently distorts the output size because Tesseract assumes it
 knows the DPI better than we do.

Does not work for Ghostscript, because it emerges that Ghostscript
honors /UserUnit for rasterizing but not in pdfwrite (resolve/wontfix).

https://bugs.ghostscript.com/show_bug.cgi?id=690781

Ghostscript’s output would need to be patched in a PDF/A safe way for
this to work. Temporary route may be to block Ghostscript if
/UserUnit.
2017-05-24 23:26:07 -07:00
James R. Barlow
eb1cd38f6c Add an open helper that is compatible with pathlib 2017-05-24 16:19:15 -07:00
James R. Barlow
148b632b4f Prove multiprocessing works, although it is still racy in some places 2017-05-23 16:32:13 -07:00
James R. Barlow
591e213713 Add more dependencies for autobrew 2017-05-23 13:52:28 -07:00
James R. Barlow
75f2262659 Ensure JobContext stuff is actually tested for IPC consistency 2017-05-19 17:57:07 -07:00
James R. Barlow
d9005a1074 pdfinfo: replace most remaining dict-style access 2017-05-19 16:17:36 -07:00
James R. Barlow
3e73fa81bf pageinfo: deprecation warning 2017-05-19 16:17:07 -07:00
James R. Barlow
ba6e290231 Restore old pageinfo.py to avoid breaking compatibility 2017-05-19 15:49:23 -07:00
James R. Barlow
08e47117a3 Rename pageinfo to pdfinfo 2017-05-19 15:48:23 -07:00
James R. Barlow
532ef38157 /UserUnit is a scalar, not an array 2017-05-19 14:19:50 -07:00
James R. Barlow
4c09875890 docs: upload unpaper Dropbox link, .rst typo blocking macOS install
[ci skip]
2017-05-19 12:18:09 -07:00
James R. Barlow
0e98139712 Upload to upload.pypi.org/legacy as recommend by PyPA
https://github.com/pypa/warehouse/issues/1996#issuecomment-302784126
2017-05-19 12:06:24 -07:00
James R. Barlow
4c04d802d7 Introduce /UserUnit checking 2017-05-19 12:01:19 -07:00
James R. Barlow
b3dc404571 Update unpaper.deb link (fixes #171)
*Shakes fist a Dropbox*
2017-05-19 11:28:45 -07:00
James R. Barlow
8694f8d2eb Replace magic strings colorspace and encoding with Enums 2017-05-18 22:32:27 -07:00
James R. Barlow
263f9b79f4 pageinfo: debug stuff 2017-05-18 21:52:55 -07:00
James R. Barlow
56d2aae963 Refactor from ImageInfo index to attribute accessing 2017-05-18 18:39:14 -07:00
James R. Barlow
127706153d Refactor dictionary based image info to ImageInfo 2017-05-18 18:26:31 -07:00
James R. Barlow
caee5b1428 Access PageInfo instance variables instead of dictionary 2017-05-18 17:12:04 -07:00
James R. Barlow
6c12e7e944 Refactor pageinfo dictionary to PageInfo() 2017-05-18 16:53:38 -07:00
James R. Barlow
cd04ae6949 Refactor PdfInfo(str(filename)) -> PdfInfo(filename) 2017-05-18 16:43:50 -07:00
James R. Barlow
6a0b68298f Refactor pdf_get_all_pageinfo to PdfInfo 2017-05-18 16:31:18 -07:00
James R. Barlow
0a2f732267 docs: Fix restructured text typos 2017-05-16 23:27:10 -07:00
James R. Barlow
4bade99f27 docs: Remark that someone got bash on Windows working 2017-05-16 23:24:34 -07:00
James R. Barlow
0b048cd24e Join the build badge club 2017-05-16 23:24:05 -07:00
James R. Barlow
c69ee63d82 Travis, true is a program, not a keyword v5.0.1 2017-05-15 15:12:14 -07:00
James R. Barlow
744fa104d7 v5.0.1 release notes (anticipating) 2017-05-14 23:59:09 -07:00
James R. Barlow
e24ff0fd64 Travis: don’t update the homebrew version because we pushed to testpypi 2017-05-14 23:55:40 -07:00
James R. Barlow
5de107d44c tesseract_cache: update explanatory notes 2017-05-14 23:54:09 -07:00