James R. Barlow
7ba04267b1
Remove shims to support for old versions of pikepdf < 4
2021-11-13 00:43:20 -08:00
James R. Barlow
757b72b0af
Revert "Remove apparently unused portion of a test"
...
This reverts commit d89a633ba73af4a6bdacda6b9a4c0638b39167bd.
2021-04-16 00:21:11 -07:00
James R. Barlow
d673126994
Fix ZeroDivisionError on files containing images drawn at scale 0
...
Fixes #761
2021-04-15 23:26:14 -07:00
James R. Barlow
d89a633ba7
Remove apparently unused portion of a test
2021-04-15 23:25:18 -07:00
James R. Barlow
f687180ecc
tests: tidy pdfinfo
2021-01-08 15:04:52 -08:00
James R. Barlow
0b3a526049
Partial fix crash on 'userunit' None ( #700 )
...
Our method of getting data from pdfminer would silently consume a StopIteration
if pdfminer returned no processed pages, leading to odd error message.
We improve an error from pdfminer properly, and returning a more
descriptive error of our own.
It would be possible for ocrmypdf to repair the file before sending it to
pdfminer, but this seems to be rare enough that we won't do that yet.
2021-01-01 01:11:32 -08:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
...
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
872bafad4b
Reinstate quick test for text/no text
...
Partial revert of commit 991db17
2020-06-10 12:00:52 -07:00
James R. Barlow
64891c2fc3
Pre-release delinting
2020-06-09 15:27:14 -07:00
James R. Barlow
0f942fb714
Rename ocrmypdf.exec -> ocrmypdf._exec
2020-06-09 14:59:09 -07:00
James R. Barlow
991db17fde
Remove Ghostscript-based text extraction
...
While faster than Python based methods, we've outgrown the limited
amount of information Ghostscript provides with this feature, and it
repeats an analysis we have to do anyway to learn what images are
present.
2020-04-26 04:02:07 -07:00
James R. Barlow
94c52a6fa3
Refactor 'xyres' into Resolution
2020-04-24 04:12:05 -07:00
James R. Barlow
57771f06a3
Refactor xy-pair for resolution to tuple
2020-04-16 15:38:33 -07:00
James R. Barlow
23bc3d3a29
tests: workaround for Ghostscript 9.52 txtwrite problem
2020-03-29 22:45:16 -07:00
James R. Barlow
c5edff2c2f
Sort imports
2019-12-19 15:31:18 -08:00
James R. Barlow
4ab0a8ff35
Fix test_single_page_inline_image - remove temp file
2019-12-04 17:13:51 -08:00
James R. Barlow
6fbeb6347d
Merge api (without plugins)
2019-07-27 02:04:01 -07:00
James R. Barlow
12769b96e5
Drop support for omitting pdfminer.six
2019-07-10 13:37:01 -07:00
James R. Barlow
c357d4146e
Restructure ocrmypdf.pdfinfo
2019-06-20 03:10:41 -07:00
James R. Barlow
7d330afd81
Delinting
2019-01-02 13:34:45 -08:00
James R. Barlow
c771938907
Convert to f-strings where it makes sense
2018-12-31 15:01:19 -08:00
James R. Barlow
8c0009c5c8
Make pdfminer.six optional
...
Mainly since the current release of pdfminer.six lacks a sdist, blocking
homebrew packaging. Also in case other distros don't accept pdfminer.six.
2018-12-31 01:08:43 -08:00
James R. Barlow
0880b16491
Sort imports with isort
2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce
Reformat with black
2018-12-30 01:27:49 -08:00
James R. Barlow
13d20bd993
pdfinfo: tolerate PDFs that overflow and underflow the graphics stack
2018-12-15 15:10:29 -08:00
James R. Barlow
9e6b54c7ed
Add test case for Type3 fonts with no Unicode mapping
2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f
Test case: true type font without Unicode mapping
2018-11-15 16:22:53 -08:00
James R. Barlow
501ce726e7
Fix two failing tests
2018-11-06 11:16:08 -08:00
James R. Barlow
f564aaf485
Remove only_ocr_text
2018-10-28 22:41:18 -07:00
James R. Barlow
58cc70725e
Reorganize around getting bboxes for visible/invisible text
2018-10-26 01:07:02 -07:00
James R. Barlow
16af753206
Add functional "redo OCR" feature
...
Needs argument validation and some other changes. Needs testing
with mixed-content PDFs.
Only really works for pure invisible text at the moment.
2018-10-19 00:02:19 -07:00
James R. Barlow
b18e66e2ca
pdfinfo: learn to detect vector graphic objects
2018-10-18 01:21:51 -07:00
James R. Barlow
216d60ea2c
pdfinfo: improve the regex
2018-07-04 00:59:32 -07:00