James R. Barlow
07b638a394
pdfminer: detect TrueType fonts with no valid encoding information
2018-11-15 13:44:11 -08:00
James R. Barlow
9bee2405d8
Leptonica: make threshold functions more flexible
2018-11-15 13:43:34 -08:00
James R. Barlow
8f040491bf
Fix erasure of undetectable barcodes
2018-11-15 12:03:51 -08:00
James R. Barlow
8a18988706
Fix 'del draw' exception
2018-11-15 12:02:53 -08:00
James R. Barlow
47a954514b
Fix name2unicode ignoring certain markers
2018-11-15 12:02:30 -08:00
James R. Barlow
e3b65d4288
Fix detailed page analysis enabled at wrong time
2018-11-15 12:02:08 -08:00
James R. Barlow
4704f7ed1d
Add ReadTheDocs yml so we can build with Py3.6
2018-11-12 13:43:17 -08:00
James R. Barlow
3a2745445a
Fix docs build
2018-11-12 13:26:04 -08:00
James R. Barlow
12e15bab15
v7.3.0 release notes
v7.3.0
2018-11-11 02:05:52 -08:00
James R. Barlow
9593aa4fb9
Merge v7.3.0 development
2018-11-11 01:38:42 -08:00
James R. Barlow
817d520e63
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2018-11-11 01:34:00 -08:00
James R. Barlow
700abbb8a5
Documentation for OCR quality features
2018-11-10 15:48:41 -08:00
James R. Barlow
701ef1df3f
Add threshold function to work around Tesseract's poor thresholding of bright backgrounds
2018-11-10 15:34:37 -08:00
James R. Barlow
0f5c484b62
Travis: only need to specify chardet because we use pip install --no-deps
2018-11-10 13:57:04 -08:00
James R. Barlow
cc7f2a3f02
Fix Python 3.5 pathlib regressions
2018-11-10 02:11:23 -08:00
James R. Barlow
755b5d87e3
Add missing chardet, implied by pdfminer.six?
2018-11-10 01:50:51 -08:00
James R. Barlow
e55a4115e1
Travis: pytest 3.10.0 internal error?
2018-11-10 01:44:05 -08:00
James R. Barlow
16a6fd2ea9
Update docs for --redo-ocr and --mask-barcodes
2018-11-10 01:34:33 -08:00
James R. Barlow
e3fce112ed
main.txt: wrong pdfminer
2018-11-10 01:32:27 -08:00
James R. Barlow
eacd26a68b
Mention v6.2.5 release
2018-11-10 01:10:45 -08:00
James R. Barlow
0e88b3c38a
Update v7.3.0 release notes
2018-11-10 01:09:19 -08:00
James R. Barlow
a2170ef8d6
test: test version check code
2018-11-10 00:56:22 -08:00
James R. Barlow
eed0424390
Update requirements
2018-11-10 00:56:04 -08:00
James R. Barlow
5ed05e08b1
Fix "no languages" test and misuse of os.environ
2018-11-09 01:57:11 -08:00
James R. Barlow
58b26f6715
Leptonica: learn to despeckle 1bpp images
2018-11-07 01:49:13 -08:00
James R. Barlow
806daf4284
leptonica: reduce boilerplate for PIX (2/2)
2018-11-06 20:33:40 -08:00
James R. Barlow
c64bc9329e
leptonica: reduce boilerplate for wrapper classes (except PIX)
2018-11-06 20:12:09 -08:00
James R. Barlow
dd01745519
Leptonica: add masked threshold fn
2018-11-06 19:31:06 -08:00
James R. Barlow
501ce726e7
Fix two failing tests
2018-11-06 11:16:08 -08:00
James R. Barlow
03076e89ce
Leptonica: reduce verbosity, more error trapping, more garbage collection
2018-11-06 11:10:59 -08:00
James R. Barlow
02f37293ee
Integrate barcode masking
2018-11-05 13:01:13 -08:00
James R. Barlow
590942ad14
Leptonica: Add barcode API
2018-11-05 01:48:38 -08:00
James R. Barlow
2ac028c759
test: Add a basic redo OCR test
2018-11-04 15:54:41 -08:00
James R. Barlow
2125b5bfab
Remove text detection from our parser interpret_contents
...
It's redundant now
2018-11-04 15:47:55 -08:00
James R. Barlow
b96532caa4
Only do detailed page analysis when needed by --redo-ocr
2018-11-04 15:40:49 -08:00
James R. Barlow
995fc58466
Move Ghostscript text analysis into its own module
2018-11-04 14:55:48 -08:00
James R. Barlow
c023cae299
Make pdfminer Type3 patch conditional on PScript5.dll
...
It appears that PDFs created by this software have a bug in their BBox
which will cause us to misjudge the space occupied by the font.
Other programs probably work around this by ignoring BBox and reading
each character procedure.
2018-11-04 01:53:53 -07:00
James R. Barlow
237eaf9130
Exception message not printed in some cases
...
Closes #310
2018-11-03 17:10:24 -07:00
James R. Barlow
8b9ab25125
coverage: test compile leptonica
2018-11-02 01:55:25 -07:00
James R. Barlow
77e87abe8f
coverage: ensure get_orientation is checked
2018-11-02 01:32:20 -07:00
James R. Barlow
3be02e1e8d
coverage: improve leptonic; don't create objects with null pointers
2018-11-02 01:10:10 -07:00
James R. Barlow
64c9ede979
leptonica: barcodes, BOXA
2018-11-02 00:42:01 -07:00
James R. Barlow
5b8d197812
coverage: make it more likely timeout is tested
2018-11-02 00:41:15 -07:00
James R. Barlow
2cba62dc4f
coverage: ensure rotation is actually tested
2018-11-02 00:40:56 -07:00
James R. Barlow
288e28328f
coverage: add qpdf
2018-11-02 00:37:33 -07:00
James R. Barlow
b8214b3c49
coverage: exclude unicodefun.py
2018-11-02 00:33:08 -07:00
James R. Barlow
8681693994
Set up code coverage (it works with multiprocessing now!)
2018-11-02 00:31:50 -07:00
James R. Barlow
1364c63b7c
Fix failure to pickle file with AcroForm
2018-11-01 20:07:53 -07:00
James R. Barlow
4ba9e8fe25
Add AcroForm detection
2018-10-30 22:28:44 -07:00
James R. Barlow
a195713bb4
Throw exception on corrupt text
2018-10-30 16:35:09 -07:00