2676 Commits

Author SHA1 Message Date
James R. Barlow
ad15e845f9 docs: Ghostscript PDF/A XMP metadata loss; ocrmypdf-webservice
[ci skip]
2018-12-17 23:20:49 -08:00
James R. Barlow
ab632f57cd v7.4.0 release notes v7.4.0 2018-12-15 15:27:23 -08:00
James R. Barlow
13d20bd993 pdfinfo: tolerate PDFs that overflow and underflow the graphics stack 2018-12-15 15:10:29 -08:00
James R. Barlow
b973208137 Require pikepdf 0.9.1 2018-12-15 14:23:10 -08:00
James R. Barlow
942abf8074 Fix reqs/main.txt for pikepdf 0.9.0 2018-12-14 23:29:26 -08:00
James R. Barlow
ed9bb985e2 Fix pikepdf 0.9.0 2018-12-14 23:21:13 -08:00
James R. Barlow
5a7a8e573b Require pikepdf 0.9.0 2018-12-14 23:06:57 -08:00
James R. Barlow
ce878db913 Rename to polyglot.dockerfile 2018-12-14 23:06:29 -08:00
James R. Barlow
a3d58683b2 Update webservice.py with separate license 2018-12-14 23:05:54 -08:00
James R. Barlow
039e8ca7e7 Merge branches 'feature/newer-pike' and 'feature/webapp' 2018-12-14 18:08:31 -08:00
James R. Barlow
0ebbd4e21b Don't open encrypted files, even if password is empty 2018-12-13 22:48:00 -08:00
James R. Barlow
2cb75f6076 Refactor pipeline to make PDF/A conversion a separate step 2018-12-13 20:48:48 -08:00
James R. Barlow
857d871364 Fix regression on Ghostscript path 2018-12-13 20:36:41 -08:00
James R. Barlow
632dab2cc0 Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.

Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.

Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00
James R. Barlow
7647918f2d setup: suppress XMLParser() warning - defusedxml related 2018-12-12 22:13:32 -08:00
James R. Barlow
75c5d8055c pdfinfo: fix FutureWarning 2018-12-12 22:12:14 -08:00
James R. Barlow
a938bbea55 Remove more libxmp dependencies 2018-12-12 22:02:35 -08:00
James R. Barlow
414407fbd6 Deprecate encode/decode_pdf_date and remap to pikepdf version 2018-12-12 22:01:21 -08:00
James R. Barlow
076fc717df pdfa: replace PDF/A checking with pikepdf implementation 2018-12-12 21:41:16 -08:00
James R. Barlow
2a04b2d82b Rename webapp to webservice 2018-12-12 21:29:05 -08:00
James R. Barlow
065db414c0 webapp docker: Build from polyglot 2018-12-12 21:24:04 -08:00
James R. Barlow
19a054a78b Add webapp stuff 2018-12-10 20:03:52 -08:00
James R. Barlow
9df24a81b7 Fix comment in layout.py 2018-11-28 15:16:34 -08:00
James R. Barlow
40c0acd3f2 Support using --force-ocr and --threshold or --mask-barcodes together 2018-11-28 15:16:24 -08:00
James R. Barlow
20db7f0a8f leptonica: delete file junkpixt.png if created 2018-11-28 13:47:55 -08:00
James R. Barlow
e54f6ee37f v7.3.1 release notes v7.3.1 2018-11-16 02:13:41 -08:00
James R. Barlow
2da556bf79 Fix unsupported operand Decimal, float 2018-11-16 02:13:25 -08:00
James R. Barlow
b183ad8167 Fix barcodes error handling 2018-11-16 02:08:16 -08:00
James R. Barlow
9e6b54c7ed Add test case for Type3 fonts with no Unicode mapping 2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f Test case: true type font without Unicode mapping 2018-11-15 16:22:53 -08:00
James R. Barlow
622f2c4bab More argument checking 2018-11-15 15:59:38 -08:00
James R. Barlow
07b638a394 pdfminer: detect TrueType fonts with no valid encoding information 2018-11-15 13:44:11 -08:00
James R. Barlow
9bee2405d8 Leptonica: make threshold functions more flexible 2018-11-15 13:43:34 -08:00
James R. Barlow
8f040491bf Fix erasure of undetectable barcodes 2018-11-15 12:03:51 -08:00
James R. Barlow
8a18988706 Fix 'del draw' exception 2018-11-15 12:02:53 -08:00
James R. Barlow
47a954514b Fix name2unicode ignoring certain markers 2018-11-15 12:02:30 -08:00
James R. Barlow
e3b65d4288 Fix detailed page analysis enabled at wrong time 2018-11-15 12:02:08 -08:00
James R. Barlow
4704f7ed1d Add ReadTheDocs yml so we can build with Py3.6 2018-11-12 13:43:17 -08:00
James R. Barlow
3a2745445a Fix docs build 2018-11-12 13:26:04 -08:00
James R. Barlow
12e15bab15 v7.3.0 release notes v7.3.0 2018-11-11 02:05:52 -08:00
James R. Barlow
9593aa4fb9 Merge v7.3.0 development 2018-11-11 01:38:42 -08:00
James R. Barlow
817d520e63 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2018-11-11 01:34:00 -08:00
James R. Barlow
700abbb8a5 Documentation for OCR quality features 2018-11-10 15:48:41 -08:00
James R. Barlow
701ef1df3f Add threshold function to work around Tesseract's poor thresholding of bright backgrounds 2018-11-10 15:34:37 -08:00
James R. Barlow
0f5c484b62 Travis: only need to specify chardet because we use pip install --no-deps 2018-11-10 13:57:04 -08:00
James R. Barlow
cc7f2a3f02 Fix Python 3.5 pathlib regressions 2018-11-10 02:11:23 -08:00
James R. Barlow
755b5d87e3 Add missing chardet, implied by pdfminer.six? 2018-11-10 01:50:51 -08:00
James R. Barlow
e55a4115e1 Travis: pytest 3.10.0 internal error? 2018-11-10 01:44:05 -08:00
James R. Barlow
16a6fd2ea9 Update docs for --redo-ocr and --mask-barcodes 2018-11-10 01:34:33 -08:00
James R. Barlow
e3fce112ed main.txt: wrong pdfminer 2018-11-10 01:32:27 -08:00