James R. Barlow
871979abd6
Temporarily unbreak without fitz mode
2018-05-11 17:32:15 -07:00
James R. Barlow
efb95722ca
Travis: Use declarative APT for Tesseract too
2018-05-11 12:46:10 -07:00
James R. Barlow
d9bbb80a6b
Don't try to run jbig2 when not available
2018-05-11 12:42:00 -07:00
James R. Barlow
3254315127
Update test cache
2018-05-11 12:19:50 -07:00
James R. Barlow
ca297fd26b
Update tests
2018-05-11 02:33:44 -07:00
James R. Barlow
ac36a43cef
Warn about --user-words not having any effect
...
Might be available in full release of Tess4
2018-05-11 02:31:07 -07:00
James R. Barlow
f00183115d
Update our dependencies
2018-05-11 02:11:55 -07:00
James R. Barlow
161b29a899
Check jbig2 when optimizing is requested
2018-05-11 02:11:01 -07:00
James R. Barlow
72253d09fa
Add arguments to control optimization
2018-05-10 22:23:24 -07:00
James R. Barlow
40d09ddb23
Fix merge error in Leptonica
2018-05-10 21:17:47 -07:00
James R. Barlow
3026d86a9e
Remove jbig2enc.py
2018-05-10 21:15:07 -07:00
James R. Barlow
0661a7edc3
Merge optimize
2018-05-10 21:05:32 -07:00
James R. Barlow
24b0adfacc
Merge branch 'master' into develop
2018-05-10 20:54:55 -07:00
James R. Barlow
acc6698ab3
Make XML metadata test actually work
2018-05-10 20:37:10 -07:00
James R. Barlow
606d3e6aa1
Remove tests that exercise obsolete features (tesseract, -g)
2018-05-10 20:33:32 -07:00
James R. Barlow
687a7954d6
test_main: uses leptonica
2018-05-10 19:05:31 -07:00
James R. Barlow
36a53a7b37
Weave: Unconditionally rotate and scale the text layerThis solves two issues. First, the text layer can end up being adifferent size, probably if the DPI is not an integer; scaling helps itfit slightly better. Second, other printable text on the page can end uphorizontally scaled or misaligned if we don't all of our drawing in aq/Q pair.
2018-05-10 19:03:31 -07:00
James R. Barlow
0a5982a902
PyMuPDF tweaks: don't clean
...
In MuPDF 1.13 clean might be unreliable, so explicitly don't do it,
even though it doesn't cause trouble in 1.12.
2018-05-10 18:50:52 -07:00
James R. Barlow
601863f9e9
Return to PyMuPDF 1.12.5
2018-05-10 18:47:10 -07:00
James R. Barlow
c9ce731119
Fix DPI mismatch between OCR page and source page
2018-05-10 17:34:08 -07:00
James R. Barlow
abed8e034e
Add metadata preservation test from stash
2018-05-10 16:43:28 -07:00
James R. Barlow
63032d304d
Revert "Since PyMuPDF 1.13.3 corrupts text, pin 1.12.5 and work around it"
...
This reverts commit b0ce7c63dd27257d9c979fde9013243b8ae38c98.
2018-05-10 16:27:17 -07:00
James R. Barlow
a57ecede78
Refactor textareas to remove duplicate code
2018-05-10 16:26:52 -07:00
James R. Barlow
b0ce7c63dd
Since PyMuPDF 1.13.3 corrupts text, pin 1.12.5 and work around it
2018-05-10 16:10:24 -07:00
James R. Barlow
d139a11c16
Weave: periodically save to prevent indefinite growth of open file list
2018-05-10 15:08:57 -07:00
James R. Barlow
aef043db0b
Revise parameter validation for output-type, pdf-renderer, lang
2018-05-10 14:53:22 -07:00
James R. Barlow
b8f3ead541
Remove tesseract renderer entirely
...
Grafting lets us work with older Tesseract versions as if they could use
sandwich, so there is no point in keeping it. It's been deprecated for a
long time now anyway.
2018-05-10 14:06:13 -07:00
James R. Barlow
e0bb898f29
Remove hocr debug renderer (-g)
...
The fact that this produces additional pages makes it a maintenance
burden. hocr can be debugged using hocrtransform.
2018-05-10 13:48:39 -07:00
James R. Barlow
45336c7c28
textareas: filter out images
2018-05-10 01:17:28 -07:00
James R. Barlow
20aabb2e83
When deciding if there is a text on a page, ignore the margins
...
Margins may include watermarks or digital stamps on otherwise
text-free pages.
2018-05-10 01:16:11 -07:00
James R. Barlow
1539e24d61
Ignore masks when deciding what color to rasterize at
2018-05-10 00:49:36 -07:00
Fabian Rodriguez
c7cf041e4a
Fixed language option example (French) ( #266 )
...
Replace fre to fra.
2018-05-10 00:10:27 -07:00
James R. Barlow
da80d3f354
Add unconditional (for now) whiteout of text areas
2018-05-07 17:37:46 -07:00
James R. Barlow
001c8d7678
Upgrade PyMuPDF version
2018-05-07 16:24:26 -07:00
James R. Barlow
38ab03655b
Restore unpaper
...
It's a suggested/recommended dep not required in Deb/Ubu.
2018-05-06 21:36:12 -07:00
James R. Barlow
9226f8a5d1
Trap PDF/A-3 errors on old Ghostscript
v6.2.0
2018-05-04 15:29:43 -07:00
James R. Barlow
5c8a007f3e
Fix failure to prevent use of Ghostscript on /UserUnit files
2018-05-04 13:34:34 -07:00
James R. Barlow
b3ad3e297d
v6.2.0 fixes
2018-05-03 17:04:23 -07:00
James R. Barlow
d607553e48
v6.2.0 Release notes
2018-05-03 16:47:21 -07:00
James R. Barlow
7cf83c77ca
Merge branch 'feature/pdfa3'
2018-05-03 16:45:57 -07:00
James R. Barlow
8a9f174f63
Fix XMP validation issue with /CreationDate
...
Related to previous validation issue. If the /CreationDate had no
timezone, Ghostscript also creates invalid metadata. Work around this.
Also fix up PDF date decoding, and transcode dates to standardize them.
2018-05-03 16:30:20 -07:00
James R. Barlow
98a0786c32
Add 18.04 update procedure
2018-05-03 13:55:16 -07:00
James R. Barlow
df1129724c
Update Dockerfile for Ubuntu 18.04
2018-05-03 01:27:13 -07:00
James R. Barlow
423cef08bf
Handle procset properly
2018-05-02 14:48:02 -07:00
James R. Barlow
04580accb4
Document aliasing of tesseract renderer
2018-05-02 14:47:38 -07:00
James R. Barlow
6376f77b8c
Refactor, remove trigonometry
2018-05-02 12:30:34 -07:00
James R. Barlow
e27e614ed9
Fixed rotation hard case
2018-05-02 01:32:11 -07:00
James R. Barlow
b0c04704a1
Fixed all but one rotation case
2018-05-02 01:24:21 -07:00
James R. Barlow
6bb6bf8323
Fix correction angle used from wrong page
2018-05-02 01:00:30 -07:00
James R. Barlow
e22fe8aefc
Silence debug messages
2018-05-01 23:51:54 -07:00