OCRmyPDF/tests/test_graft.py

# © 2019 James R. Barlow: github.com/jbarlow83
#
# This file is part of OCRmyPDF.
#
# OCRmyPDF is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# OCRmyPDF is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with OCRmyPDF.  If not, see <http://www.gnu.org/licenses/>.

from unittest.mock import patch

import pikepdf
import pytest

import ocrmypdf


def test_no_glyphless_graft(resources, outdir):
    with pikepdf.open(resources / 'francais.pdf') as pdf, pikepdf.open(
        resources / 'aspect.pdf'
    ) as pdf_aspect, pikepdf.open(resources / 'cmyk.pdf') as pdf_cmyk:
        pdf.pages.extend(pdf_aspect.pages)
        pdf.pages.extend(pdf_cmyk.pages)
        pdf.save(outdir / 'test.pdf')

    with patch('ocrmypdf._graft.MAX_REPLACE_PAGES', 2):
        ocrmypdf.ocr(
            outdir / 'test.pdf',
            outdir / 'out.pdf',
            deskew=True,
            tesseract_timeout=0,
            force_ocr=True,
        )
    # This test needs asserts


def test_links(resources, outpdf):
    ocrmypdf.ocr(
        resources / 'link.pdf', outpdf, redo_ocr=True, oversample=200, output_type='pdf'
    )
    with pikepdf.open(outpdf) as pdf:
        p1 = pdf.pages[0]
        p2 = pdf.pages[1]
        assert p1.Annots[0].A.D[0].objgen == p2.objgen
        assert p2.Annots[0].A.D[0].objgen == p1.objgen
Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00			`# © 2019 James R. Barlow: github.com/jbarlow83`
			`#`
			`# This file is part of OCRmyPDF.`
			`#`
			`# OCRmyPDF is free software: you can redistribute it and/or modify`
			`# it under the terms of the GNU General Public License as published by`
			`# the Free Software Foundation, either version 3 of the License, or`
			`# (at your option) any later version.`
			`#`
			`# OCRmyPDF is distributed in the hope that it will be useful,`
			`# but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`# GNU General Public License for more details.`
			`#`
			`# You should have received a copy of the GNU General Public License`
			`# along with OCRmyPDF. If not, see <http://www.gnu.org/licenses/>.`

Remove os_environ() context manager 2019-11-28 16:52:56 -08:00			`from unittest.mock import patch`
Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00
Sort imports 2019-12-19 15:29:56 -08:00			`import pikepdf`
Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00			`import pytest`

Remove "from ocrmypdf import ocrmypdf" Messes up future imports from ocrmypdf, so don't do it. 2019-06-12 17:52:25 -07:00			`import ocrmypdf`
Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00
Fix issue when weave handoff occurs with no OCR font present If using --tesseract-timeout 0 and any image processing on a file with more than 100 pages, the weave handoff will occur. Ensure this works correctly even if no Glyphless font is present. Closes #347 2019-02-10 01:52:31 -08:00
Rename weave -> graft 2019-06-13 01:16:56 -07:00			`def test_no_glyphless_graft(resources, outdir):`
tests: add force OCR to a file with text that Ghostscript doesn't see For gs 9.52 support. Also refactor use of pikepdf.open() to use with blocks. 2020-03-29 22:44:16 -07:00			`with pikepdf.open(resources / 'francais.pdf') as pdf, pikepdf.open(`
			`resources / 'aspect.pdf'`
			`) as pdf_aspect, pikepdf.open(resources / 'cmyk.pdf') as pdf_cmyk:`
			`pdf.pages.extend(pdf_aspect.pages)`
			`pdf.pages.extend(pdf_cmyk.pages)`
			`pdf.save(outdir / 'test.pdf')`
Fix issue when weave handoff occurs with no OCR font present If using --tesseract-timeout 0 and any image processing on a file with more than 100 pages, the weave handoff will occur. Ensure this works correctly even if no Glyphless font is present. Closes #347 2019-02-10 01:52:31 -08:00
Remove os_environ() context manager 2019-11-28 16:52:56 -08:00			`with patch('ocrmypdf._graft.MAX_REPLACE_PAGES', 2):`
rename ocrmypdf.run -> ocrmypdf.ocr 2019-07-07 02:11:44 -07:00			`ocrmypdf.ocr(`
tests: add force OCR to a file with text that Ghostscript doesn't see For gs 9.52 support. Also refactor use of pikepdf.open() to use with blocks. 2020-03-29 22:44:16 -07:00			`outdir / 'test.pdf',`
			`outdir / 'out.pdf',`
			`deskew=True,`
			`tesseract_timeout=0,`
			`force_ocr=True,`
Convert one test to use API 2019-05-22 23:53:48 -07:00			`)`
tests: add force OCR to a file with text that Ghostscript doesn't see For gs 9.52 support. Also refactor use of pikepdf.open() to use with blocks. 2020-03-29 22:44:16 -07:00			`# This test needs asserts`
weave: add new test for link consistency 2019-05-12 03:36:33 -07:00

			`def test_links(resources, outpdf):`
rename ocrmypdf.run -> ocrmypdf.ocr 2019-07-07 02:11:44 -07:00			`ocrmypdf.ocr(`
Convert one test to use API 2019-05-22 23:53:48 -07:00			`resources / 'link.pdf', outpdf, redo_ocr=True, oversample=200, output_type='pdf'`
weave: add new test for link consistency 2019-05-12 03:36:33 -07:00			`)`
tests: add force OCR to a file with text that Ghostscript doesn't see For gs 9.52 support. Also refactor use of pikepdf.open() to use with blocks. 2020-03-29 22:44:16 -07:00			`with pikepdf.open(outpdf) as pdf:`
			`p1 = pdf.pages[0]`
			`p2 = pdf.pages[1]`
			`assert p1.Annots[0].A.D[0].objgen == p2.objgen`
			`assert p2.Annots[0].A.D[0].objgen == p1.objgen`