OCRmyPDF/docs/errors.rst

.. SPDX-FileCopyrightText: 2022 James R. Barlow
..
.. SPDX-License-Identifier: CC-BY-SA-4.0

=====================
Common error messages
=====================

Page already has text
=====================

.. code-block::

   ERROR -    1: page already has text! – aborting (use --force-ocr to force OCR)

You ran ocrmypdf on a file that already contains printable text or a
hidden OCR text layer (it can't quite tell the difference). You probably
don't want to do this, because the file is already searchable.

As the error message suggests, your options are:

-  ``ocrmypdf --force-ocr`` to :ref:`rasterize <raster-vector>` all
   vector content and run OCR on the images. This is useful if a
   previous OCR program failed, or if the document contains a text
   watermark.
-  ``ocrmypdf --skip-text`` to skip OCR and other processing on any
   pages that contain text. Text pages will be copied into the output
   PDF without modification.
-  ``ocrmypdf --redo-ocr`` to scan the file for any existing OCR
   (non-printing text), remove it, and do OCR again. This is one way
   to take advantage of improvements in OCR accuracy. Printable vector
   text is excluded from OCR, so this can be used on files that contain
   a mix of digital and scanned files.


Input file 'filename' is not a valid PDF
========================================

OCRmyPDF checks files with pikepdf, a library that in turn uses libqpdf to fixes
errors in PDFs, before it tries to work on them. In most cases this happens
because the PDF is corrupt and truncated (incomplete file copying) and not much
can be done.

You can try rewriting the file with Ghostscript:

.. code-block:: bash

    gs -o output.pdf -dSAFER -sDEVICE=pdfwrite input.pdf

``pdftk`` can also rewrite PDFs:

.. code-block:: bash

    pdftk input.pdf cat output output.pdf

Sometimes Acrobat can repair PDFs with its `Preflight
tool <https://helpx.adobe.com/acrobat/using/correcting-problem-areas-preflight-tool.html>`__.
-												Change to SPDX license tracking

											
										
										
											2022-07-28 01:06:46 -07:00
+								.. SPDX-FileCopyrightText: 2022 James R. Barlow
 								..
 								.. SPDX-License-Identifier: CC-BY-SA-4.0
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								=====================
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
+								Common error messages
 								=====================
 								Page already has text
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								=====================
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								.. code-block::
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								   ERROR -    1: page already has text! – aborting (use --force-ocr to force OCR)
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								You ran ocrmypdf on a file that already contains printable text or a
 								hidden OCR text layer (it can't quite tell the difference). You probably
 								don't want to do this, because the file is already searchable.
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
 								As the error message suggests, your options are:
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								-  ``ocrmypdf --force-ocr`` to :ref:`rasterize <raster-vector>` all
 								   vector content and run OCR on the images. This is useful if a
 								   previous OCR program failed, or if the document contains a text
 								   watermark.
 								-  ``ocrmypdf --skip-text`` to skip OCR and other processing on any
 								   pages that contain text. Text pages will be copied into the output
 								   PDF without modification.
-												Improve help text about aborting due to text

											
										
										
											2020-04-15 02:17:55 -07:00
+								-  ``ocrmypdf --redo-ocr`` to scan the file for any existing OCR
 								   (non-printing text), remove it, and do OCR again. This is one way
 								   to take advantage of improvements in OCR accuracy. Printable vector
 								   text is excluded from OCR, so this can be used on files that contain
 								   a mix of digital and scanned files.
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								Input file 'filename' is not a valid PDF
 								========================================
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Remove last vestiges of command line usage of qpdf - change to check_pdf

											
										
										
											2020-04-26 05:33:26 -07:00
+								OCRmyPDF checks files with pikepdf, a library that in turn uses libqpdf to fixes
 								errors in PDFs, before it tries to work on them. In most cases this happens
 								because the PDF is corrupt and truncated (incomplete file copying) and not much
 								can be done.
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								You can try rewriting the file with Ghostscript:
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								.. code-block:: bash
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								    gs -o output.pdf -dSAFER -sDEVICE=pdfwrite input.pdf
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								``pdftk`` can also rewrite PDFs:
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								.. code-block:: bash
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								    pdftk input.pdf cat output output.pdf
-												More work on documentation

											
										
										
											2016-10-28 01:22:40 -07:00
-												Use pandoc to rewrite .rst files

Fixes all of the long lines, mainly.

											
										
										
											2019-06-22 17:29:26 -07:00
+								Sometimes Acrobat can repair PDFs with its `Preflight
 								tool <https://helpx.adobe.com/acrobat/using/correcting-problem-areas-preflight-tool.html>`__.