2895 Commits

Author SHA1 Message Date
James R. Barlow
07b638a394 pdfminer: detect TrueType fonts with no valid encoding information 2018-11-15 13:44:11 -08:00
James R. Barlow
9bee2405d8 Leptonica: make threshold functions more flexible 2018-11-15 13:43:34 -08:00
James R. Barlow
8f040491bf Fix erasure of undetectable barcodes 2018-11-15 12:03:51 -08:00
James R. Barlow
8a18988706 Fix 'del draw' exception 2018-11-15 12:02:53 -08:00
James R. Barlow
47a954514b Fix name2unicode ignoring certain markers 2018-11-15 12:02:30 -08:00
James R. Barlow
e3b65d4288 Fix detailed page analysis enabled at wrong time 2018-11-15 12:02:08 -08:00
James R. Barlow
4704f7ed1d Add ReadTheDocs yml so we can build with Py3.6 2018-11-12 13:43:17 -08:00
James R. Barlow
3a2745445a Fix docs build 2018-11-12 13:26:04 -08:00
James R. Barlow
12e15bab15 v7.3.0 release notes v7.3.0 2018-11-11 02:05:52 -08:00
James R. Barlow
9593aa4fb9 Merge v7.3.0 development 2018-11-11 01:38:42 -08:00
James R. Barlow
817d520e63 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2018-11-11 01:34:00 -08:00
James R. Barlow
700abbb8a5 Documentation for OCR quality features 2018-11-10 15:48:41 -08:00
James R. Barlow
701ef1df3f Add threshold function to work around Tesseract's poor thresholding of bright backgrounds 2018-11-10 15:34:37 -08:00
James R. Barlow
0f5c484b62 Travis: only need to specify chardet because we use pip install --no-deps 2018-11-10 13:57:04 -08:00
James R. Barlow
cc7f2a3f02 Fix Python 3.5 pathlib regressions 2018-11-10 02:11:23 -08:00
James R. Barlow
755b5d87e3 Add missing chardet, implied by pdfminer.six? 2018-11-10 01:50:51 -08:00
James R. Barlow
e55a4115e1 Travis: pytest 3.10.0 internal error? 2018-11-10 01:44:05 -08:00
James R. Barlow
16a6fd2ea9 Update docs for --redo-ocr and --mask-barcodes 2018-11-10 01:34:33 -08:00
James R. Barlow
e3fce112ed main.txt: wrong pdfminer 2018-11-10 01:32:27 -08:00
James R. Barlow
eacd26a68b Mention v6.2.5 release 2018-11-10 01:10:45 -08:00
James R. Barlow
0e88b3c38a Update v7.3.0 release notes 2018-11-10 01:09:19 -08:00
James R. Barlow
a2170ef8d6 test: test version check code 2018-11-10 00:56:22 -08:00
James R. Barlow
eed0424390 Update requirements 2018-11-10 00:56:04 -08:00
James R. Barlow
5ed05e08b1 Fix "no languages" test and misuse of os.environ 2018-11-09 01:57:11 -08:00
James R. Barlow
58b26f6715 Leptonica: learn to despeckle 1bpp images 2018-11-07 01:49:13 -08:00
James R. Barlow
806daf4284 leptonica: reduce boilerplate for PIX (2/2) 2018-11-06 20:33:40 -08:00
James R. Barlow
c64bc9329e leptonica: reduce boilerplate for wrapper classes (except PIX) 2018-11-06 20:12:09 -08:00
James R. Barlow
dd01745519 Leptonica: add masked threshold fn 2018-11-06 19:31:06 -08:00
James R. Barlow
501ce726e7 Fix two failing tests 2018-11-06 11:16:08 -08:00
James R. Barlow
03076e89ce Leptonica: reduce verbosity, more error trapping, more garbage collection 2018-11-06 11:10:59 -08:00
James R. Barlow
02f37293ee Integrate barcode masking 2018-11-05 13:01:13 -08:00
James R. Barlow
590942ad14 Leptonica: Add barcode API 2018-11-05 01:48:38 -08:00
James R. Barlow
2ac028c759 test: Add a basic redo OCR test 2018-11-04 15:54:41 -08:00
James R. Barlow
2125b5bfab Remove text detection from our parser interpret_contents
It's redundant now
2018-11-04 15:47:55 -08:00
James R. Barlow
b96532caa4 Only do detailed page analysis when needed by --redo-ocr 2018-11-04 15:40:49 -08:00
James R. Barlow
995fc58466 Move Ghostscript text analysis into its own module 2018-11-04 14:55:48 -08:00
James R. Barlow
c023cae299 Make pdfminer Type3 patch conditional on PScript5.dll
It appears that PDFs created by this software have a bug in their BBox
which will cause us to misjudge the space occupied by the font.

Other programs probably work around this by ignoring BBox and reading
each character procedure.
2018-11-04 01:53:53 -07:00
James R. Barlow
237eaf9130 Exception message not printed in some cases
Closes #310
2018-11-03 17:10:24 -07:00
James R. Barlow
8b9ab25125 coverage: test compile leptonica 2018-11-02 01:55:25 -07:00
James R. Barlow
77e87abe8f coverage: ensure get_orientation is checked 2018-11-02 01:32:20 -07:00
James R. Barlow
3be02e1e8d coverage: improve leptonic; don't create objects with null pointers 2018-11-02 01:10:10 -07:00
James R. Barlow
64c9ede979 leptonica: barcodes, BOXA 2018-11-02 00:42:01 -07:00
James R. Barlow
5b8d197812 coverage: make it more likely timeout is tested 2018-11-02 00:41:15 -07:00
James R. Barlow
2cba62dc4f coverage: ensure rotation is actually tested 2018-11-02 00:40:56 -07:00
James R. Barlow
288e28328f coverage: add qpdf 2018-11-02 00:37:33 -07:00
James R. Barlow
b8214b3c49 coverage: exclude unicodefun.py 2018-11-02 00:33:08 -07:00
James R. Barlow
8681693994 Set up code coverage (it works with multiprocessing now!) 2018-11-02 00:31:50 -07:00
James R. Barlow
1364c63b7c Fix failure to pickle file with AcroForm 2018-11-01 20:07:53 -07:00
James R. Barlow
4ba9e8fe25 Add AcroForm detection 2018-10-30 22:28:44 -07:00
James R. Barlow
a195713bb4 Throw exception on corrupt text 2018-10-30 16:35:09 -07:00