James R. Barlow
ad15e845f9
docs: Ghostscript PDF/A XMP metadata loss; ocrmypdf-webservice
...
[ci skip]
2018-12-17 23:20:49 -08:00
James R. Barlow
ab632f57cd
v7.4.0 release notes
v7.4.0
2018-12-15 15:27:23 -08:00
James R. Barlow
13d20bd993
pdfinfo: tolerate PDFs that overflow and underflow the graphics stack
2018-12-15 15:10:29 -08:00
James R. Barlow
b973208137
Require pikepdf 0.9.1
2018-12-15 14:23:10 -08:00
James R. Barlow
942abf8074
Fix reqs/main.txt for pikepdf 0.9.0
2018-12-14 23:29:26 -08:00
James R. Barlow
ed9bb985e2
Fix pikepdf 0.9.0
2018-12-14 23:21:13 -08:00
James R. Barlow
5a7a8e573b
Require pikepdf 0.9.0
2018-12-14 23:06:57 -08:00
James R. Barlow
ce878db913
Rename to polyglot.dockerfile
2018-12-14 23:06:29 -08:00
James R. Barlow
a3d58683b2
Update webservice.py with separate license
2018-12-14 23:05:54 -08:00
James R. Barlow
039e8ca7e7
Merge branches 'feature/newer-pike' and 'feature/webapp'
2018-12-14 18:08:31 -08:00
James R. Barlow
0ebbd4e21b
Don't open encrypted files, even if password is empty
2018-12-13 22:48:00 -08:00
James R. Barlow
2cb75f6076
Refactor pipeline to make PDF/A conversion a separate step
2018-12-13 20:48:48 -08:00
James R. Barlow
857d871364
Fix regression on Ghostscript path
2018-12-13 20:36:41 -08:00
James R. Barlow
632dab2cc0
Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
...
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.
Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.
Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00
James R. Barlow
7647918f2d
setup: suppress XMLParser() warning - defusedxml related
2018-12-12 22:13:32 -08:00
James R. Barlow
75c5d8055c
pdfinfo: fix FutureWarning
2018-12-12 22:12:14 -08:00
James R. Barlow
a938bbea55
Remove more libxmp dependencies
2018-12-12 22:02:35 -08:00
James R. Barlow
414407fbd6
Deprecate encode/decode_pdf_date and remap to pikepdf version
2018-12-12 22:01:21 -08:00
James R. Barlow
076fc717df
pdfa: replace PDF/A checking with pikepdf implementation
2018-12-12 21:41:16 -08:00
James R. Barlow
2a04b2d82b
Rename webapp to webservice
2018-12-12 21:29:05 -08:00
James R. Barlow
065db414c0
webapp docker: Build from polyglot
2018-12-12 21:24:04 -08:00
James R. Barlow
19a054a78b
Add webapp stuff
2018-12-10 20:03:52 -08:00
James R. Barlow
9df24a81b7
Fix comment in layout.py
2018-11-28 15:16:34 -08:00
James R. Barlow
40c0acd3f2
Support using --force-ocr and --threshold or --mask-barcodes together
2018-11-28 15:16:24 -08:00
James R. Barlow
20db7f0a8f
leptonica: delete file junkpixt.png if created
2018-11-28 13:47:55 -08:00
James R. Barlow
e54f6ee37f
v7.3.1 release notes
v7.3.1
2018-11-16 02:13:41 -08:00
James R. Barlow
2da556bf79
Fix unsupported operand Decimal, float
2018-11-16 02:13:25 -08:00
James R. Barlow
b183ad8167
Fix barcodes error handling
2018-11-16 02:08:16 -08:00
James R. Barlow
9e6b54c7ed
Add test case for Type3 fonts with no Unicode mapping
2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f
Test case: true type font without Unicode mapping
2018-11-15 16:22:53 -08:00
James R. Barlow
622f2c4bab
More argument checking
2018-11-15 15:59:38 -08:00
James R. Barlow
07b638a394
pdfminer: detect TrueType fonts with no valid encoding information
2018-11-15 13:44:11 -08:00
James R. Barlow
9bee2405d8
Leptonica: make threshold functions more flexible
2018-11-15 13:43:34 -08:00
James R. Barlow
8f040491bf
Fix erasure of undetectable barcodes
2018-11-15 12:03:51 -08:00
James R. Barlow
8a18988706
Fix 'del draw' exception
2018-11-15 12:02:53 -08:00
James R. Barlow
47a954514b
Fix name2unicode ignoring certain markers
2018-11-15 12:02:30 -08:00
James R. Barlow
e3b65d4288
Fix detailed page analysis enabled at wrong time
2018-11-15 12:02:08 -08:00
James R. Barlow
4704f7ed1d
Add ReadTheDocs yml so we can build with Py3.6
2018-11-12 13:43:17 -08:00
James R. Barlow
3a2745445a
Fix docs build
2018-11-12 13:26:04 -08:00
James R. Barlow
12e15bab15
v7.3.0 release notes
v7.3.0
2018-11-11 02:05:52 -08:00
James R. Barlow
9593aa4fb9
Merge v7.3.0 development
2018-11-11 01:38:42 -08:00
James R. Barlow
817d520e63
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2018-11-11 01:34:00 -08:00
James R. Barlow
700abbb8a5
Documentation for OCR quality features
2018-11-10 15:48:41 -08:00
James R. Barlow
701ef1df3f
Add threshold function to work around Tesseract's poor thresholding of bright backgrounds
2018-11-10 15:34:37 -08:00
James R. Barlow
0f5c484b62
Travis: only need to specify chardet because we use pip install --no-deps
2018-11-10 13:57:04 -08:00
James R. Barlow
cc7f2a3f02
Fix Python 3.5 pathlib regressions
2018-11-10 02:11:23 -08:00
James R. Barlow
755b5d87e3
Add missing chardet, implied by pdfminer.six?
2018-11-10 01:50:51 -08:00
James R. Barlow
e55a4115e1
Travis: pytest 3.10.0 internal error?
2018-11-10 01:44:05 -08:00
James R. Barlow
16a6fd2ea9
Update docs for --redo-ocr and --mask-barcodes
2018-11-10 01:34:33 -08:00
James R. Barlow
e3fce112ed
main.txt: wrong pdfminer
2018-11-10 01:32:27 -08:00