James R. Barlow
f51164aff8
Upgrade test version of pymupdf
2021-11-13 00:53:41 -08:00
James R. Barlow
6f58a14351
pdfa: remove deprecated pkg_resources based access and tests
2021-11-13 00:52:03 -08:00
James R. Barlow
7ba04267b1
Remove shims to support for old versions of pikepdf < 4
2021-11-13 00:43:20 -08:00
James R. Barlow
c725bf79da
flake8 delinting
2021-09-21 16:37:03 -07:00
James R. Barlow
e788dde607
tests: eliminate unnecessary mmap
2021-04-07 02:11:31 -07:00
James R. Barlow
aa115a8be3
Remove pytest_helpers_namespace
2021-04-07 01:56:51 -07:00
James R. Barlow
72fa347c38
tests: skip metadata test for two pikepdf versions that warn incorrectly
2020-12-29 01:47:52 -08:00
James R. Barlow
babc76fa74
tests: assert that most patched functions are called
...
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
James R. Barlow
3707af3b74
Change pdf.root to pdf.Root
2020-11-03 01:30:31 -08:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
...
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
ebfe4f0d29
Fix issue #582 - PDF/A acquires title "Untitled" after conversion
2020-06-20 02:01:16 -07:00
James R. Barlow
7b9025f397
Convert generate_pdfa to plugin
2020-06-08 22:28:38 -07:00
James R. Barlow
b109445215
Move Ghostscript rasterize_pdf to plugin
2020-06-08 17:10:27 -07:00
James R. Barlow
1598f2f0e5
Abolish spoof_tesseract_noop
2020-06-01 03:07:53 -07:00
James R. Barlow
9af94ac9b7
pipeline: use OCR engine abstraction instead of Tesseract
2020-05-16 01:28:56 -07:00
James R. Barlow
977665d2b6
Delint some tests
2020-05-08 03:49:33 -07:00
James R. Barlow
c85278b31d
Delinting
2020-05-03 00:53:29 -07:00
James R. Barlow
5dbc080fa0
Rename PDFContext->PdfContext
2020-05-02 04:32:46 -07:00
James R. Barlow
e02f6c1e97
Support plugin invocation with API
2020-05-02 03:34:31 -07:00
James R. Barlow
b3b61c152c
Handle malformed DocumentInfo ( #497 )
...
User submitted a PDF in which /Trailer /Info pointed to the XMP metadata
block instead of a DocumentInfo dictionary. Fix and add test.
2020-03-03 03:27:01 -08:00
James R. Barlow
4a27124eab
Simplify metadata for invalid xml in output
...
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
c5edff2c2f
Sort imports
2019-12-19 15:31:18 -08:00
James R. Barlow
a3726e4ce3
Fix test_metadata: use mmap in a Windows and POSIX compatible way
2019-12-04 17:13:52 -08:00
James R. Barlow
6fbeb6347d
Merge api (without plugins)
2019-07-27 02:04:01 -07:00
James R. Barlow
12769b96e5
Drop support for omitting pdfminer.six
2019-07-10 13:37:01 -07:00
James R. Barlow
fb933edc0f
Use newer pytest tmp_path API
2019-06-01 01:55:51 -07:00
James R. Barlow
ef1ef1cdf0
Fix test invalidated by Python 3.6 logging fixes
2019-05-17 15:20:07 -07:00
James R. Barlow
c904b430b6
Merge master into api branch; all test pass
2019-05-14 16:33:02 -07:00
James R. Barlow
482cb788ed
Don't use MagicMock() as a dummy logger in pytest
2019-05-11 12:44:17 -07:00
mawi
c92ccc6134
fix: tests
2019-04-08 14:57:42 +02:00
mawi
783a128bd1
feat: move to sync (none ETL) implementation - remove ruffus
2019-04-04 21:02:38 +02:00
James R. Barlow
3f1d9ef99c
Fix tests for move to Alpine dockerfile
2019-02-26 12:30:21 -08:00
James R. Barlow
f34b3015b2
Prevent Ghostscript from generating invalid XMP metadata
...
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
7d330afd81
Delinting
2019-01-02 13:34:45 -08:00
James R. Barlow
c771938907
Convert to f-strings where it makes sense
2018-12-31 15:01:19 -08:00
James R. Barlow
cfc5cdf47d
pdfa: remove a pile of deprecated code
...
It's now handled in pikepdf.
2018-12-31 00:05:13 -08:00
James R. Barlow
0880b16491
Sort imports with isort
2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce
Reformat with black
2018-12-30 01:27:49 -08:00
James R. Barlow
72b920eb16
Drop support for Python 3.5
2018-12-30 00:23:26 -08:00
James R. Barlow
b4a51907d6
Detect when metadata is dropped during PDF/A conversion
2018-12-30 00:13:25 -08:00
James R. Barlow
ed9bb985e2
Fix pikepdf 0.9.0
2018-12-14 23:21:13 -08:00
James R. Barlow
632dab2cc0
Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
...
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.
Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.
Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00
James R. Barlow
414407fbd6
Deprecate encode/decode_pdf_date and remap to pikepdf version
2018-12-12 22:01:21 -08:00
James R. Barlow
517b385fe5
Work around loss of Unicode DOCINFO in Ghostscript 9.24+
...
Ghostscript no longer supports UTF-16-BE-hex strings as a way of
supplying Unicode data in pdfmark so we have lost this functionality too:
http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=e997c6836d243ab37fe3a5f0d57974af95eb5eac
For users this means setting --title, --author, etc. will not work if gs
9.24 is installed, but if the file has existing metadata it might work.
For now we enforce police-state-strict ASCII, until there's time to
implement proper metadata editing. Relevant tests set to xfail.
2018-09-13 21:33:39 -07:00
James R. Barlow
795019b0c1
Work around invalid TOC entries
...
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.
We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
3aac3a98ca
tests: Migrate metadata tests to pikepdf
...
For some reason PyPDF2 has begun to trigger internal errors in
pytest on macOS alone. Not sure why, but nothing is wrong that I can
see. Seemed like an opportune time to switch to pikepdf; found some
new issues in the process anyway.
2018-09-10 16:06:01 -07:00
James R. Barlow
1cc9d2d3d1
Fix path error on Py3.5
2018-07-08 01:01:06 -07:00
James R. Barlow
58642aa98b
Fix issue #275 : doesn't work when installed in non-Unicode path
...
Closes #275
2018-07-07 01:35:05 -07:00
James R. Barlow
45cb4525cf
Remove other references to PyMuPDF
2018-06-13 01:02:53 -07:00
James R. Barlow
3b820ffa7b
test_metadata: change from xfail to skipif without fitz
2018-05-17 00:14:57 -07:00