OCRmyPDF/docs/errors.rst

48 lines
1.5 KiB
ReStructuredText
Raw Normal View History

=====================
2016-10-28 01:22:40 -07:00
Common error messages
=====================
Page already has text
=====================
2016-10-28 01:22:40 -07:00
.. code-block::
2016-10-28 01:22:40 -07:00
ERROR - 1: page already has text! aborting (use --force-ocr to force OCR)
2016-10-28 01:22:40 -07:00
You ran ocrmypdf on a file that already contains printable text or a
hidden OCR text layer (it can't quite tell the difference). You probably
don't want to do this, because the file is already searchable.
2016-10-28 01:22:40 -07:00
As the error message suggests, your options are:
- ``ocrmypdf --force-ocr`` to :ref:`rasterize <raster-vector>` all
vector content and run OCR on the images. This is useful if a
previous OCR program failed, or if the document contains a text
watermark.
- ``ocrmypdf --skip-text`` to skip OCR and other processing on any
pages that contain text. Text pages will be copied into the output
PDF without modification.
2016-10-28 01:22:40 -07:00
Input file 'filename' is not a valid PDF
========================================
2016-10-28 01:22:40 -07:00
OCRmyPDF passes files through qpdf, a program that fixes errors in PDFs,
before it tries to work on them. In most cases this happens because the
PDF is corrupt and truncated (incomplete file copying) and not much can
be done.
2016-10-28 01:22:40 -07:00
You can try rewriting the file with Ghostscript:
2016-10-28 01:22:40 -07:00
.. code-block:: bash
2016-10-28 01:22:40 -07:00
gs -o output.pdf -dSAFER -sDEVICE=pdfwrite input.pdf
2016-10-28 01:22:40 -07:00
``pdftk`` can also rewrite PDFs:
2016-10-28 01:22:40 -07:00
.. code-block:: bash
2016-10-28 01:22:40 -07:00
pdftk input.pdf cat output output.pdf
2016-10-28 01:22:40 -07:00
Sometimes Acrobat can repair PDFs with its `Preflight
tool <https://helpx.adobe.com/acrobat/using/correcting-problem-areas-preflight-tool.html>`__.