mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2025-12-28 23:49:33 +00:00
Document return codes
This commit is contained in:
parent
e75b6280fd
commit
10aadefd6a
@ -27,8 +27,8 @@ If you want to adjust the amount of time spent on OCR, change ``--tesseract-time
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Allow 300 seconds for OCR; skip any page larger than 50 megapixels
|
||||
ocrmypdf --tesseract-timeout 300 --skip-big 50 bigfile.pdf output.pdf
|
||||
# Allow 300 seconds for OCR; skip any page larger than 50 megapixels
|
||||
ocrmypdf --tesseract-timeout 300 --skip-big 50 bigfile.pdf output.pdf
|
||||
|
||||
Overriding default tesseract
|
||||
""""""""""""""""""""""""""""
|
||||
@ -39,20 +39,20 @@ Some relevant environment variables that influence Tesseract's behavior include:
|
||||
|
||||
.. envvar:: TESSDATA_PREFIX
|
||||
|
||||
Overrides the path to Tesseract's data files. This can allow simultaneous installation of the "best" and "fast" training data sets. OCRmyPDF does not manage this environment variable.
|
||||
Overrides the path to Tesseract's data files. This can allow simultaneous installation of the "best" and "fast" training data sets. OCRmyPDF does not manage this environment variable.
|
||||
|
||||
.. envvar:: OMP_THREAD_LIMIT
|
||||
|
||||
Controls the number of threads Tesseract will use. OCRmyPDF will manage this environment if it is not already set. (Currently, it will set it to 1 because this gives the best results in testing.)
|
||||
Controls the number of threads Tesseract will use. OCRmyPDF will manage this environment if it is not already set. (Currently, it will set it to 1 because this gives the best results in testing.)
|
||||
|
||||
For example, if you are testing tesseract 4.00 and don't wish to use an existing tesseract 3.04 installation, you can launch OCRmyPDF as follows:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
env \
|
||||
PATH=/home/user/src/tesseract4/api:$PATH \
|
||||
TESSDATA_PREFIX=/home/user/src/tesseract4 \
|
||||
ocrmypdf --tesseract-oem 2 input.pdf output.pdf
|
||||
env \
|
||||
PATH=/home/user/src/tesseract4/api:$PATH \
|
||||
TESSDATA_PREFIX=/home/user/src/tesseract4 \
|
||||
ocrmypdf --tesseract-oem 2 input.pdf output.pdf
|
||||
|
||||
In this example ``TESSDATA_PREFIX`` directs Tesseract 4.0 to use LSTM training data. ``--tesseract-oem 1`` requests tesseract 4.0's new LSTM engine. (Tesseract 4.0 only.)
|
||||
|
||||
@ -80,19 +80,19 @@ Create a file named "no-dict.cfg" with these contents:
|
||||
|
||||
::
|
||||
|
||||
load_system_dawg 0
|
||||
language_model_penalty_non_dict_word 0
|
||||
language_model_penalty_non_freq_dict_word 0
|
||||
load_system_dawg 0
|
||||
language_model_penalty_non_dict_word 0
|
||||
language_model_penalty_non_freq_dict_word 0
|
||||
|
||||
then run ocrmypdf as follows (along with any other desired arguments):
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
ocrmypdf --tesseract-config no-dict.cfg input.pdf output.pdf
|
||||
ocrmypdf --tesseract-config no-dict.cfg input.pdf output.pdf
|
||||
|
||||
.. warning::
|
||||
|
||||
Some combinations of control parameters will break Tesseract or break assumptions that OCRmyPDF makes about Tesseract's output.
|
||||
Some combinations of control parameters will break Tesseract or break assumptions that OCRmyPDF makes about Tesseract's output.
|
||||
|
||||
|
||||
Changing the PDF renderer
|
||||
@ -136,4 +136,60 @@ The ``tesseract`` renderer creates a PDF with the image and text layers precompo
|
||||
|
||||
If a PDF created with this renderer using Tesseract versions older than 3.05.00 is then passed through Ghostscript's pdfwrite feature, the OCR text *may* be corrupted. The ``--output-type=pdfa`` argument will produce a warning in this situation.
|
||||
|
||||
*This renderer is deprecated and will be removed whenever support for older versions of Tesseract is dropped.*
|
||||
*This renderer is deprecated and will be removed whenever support for older versions of Tesseract is dropped.*
|
||||
|
||||
|
||||
Return code policy
|
||||
------------------
|
||||
|
||||
OCRmyPDF writes all messages to ``stderr``. ``stdout`` is reserved for piping
|
||||
output files. ``stdin`` is reserved for piping input files.
|
||||
|
||||
The return codes generated by the OCRmyPDF are considered part of the stable
|
||||
user interface.
|
||||
|
||||
.. list-table:: Return codes
|
||||
:widths: 5 35 60
|
||||
:header-rows: 1
|
||||
|
||||
* - Code
|
||||
- Name
|
||||
- Interpretation
|
||||
* - 0
|
||||
- ``ocrmypdf.exceptions.ExitCode.ok``
|
||||
- Everything worked as expected.
|
||||
* - 1
|
||||
- ``ocrmypdf.exceptions.ExitCode.bad_args``
|
||||
- Invalid arguments, exited with an error.
|
||||
* - 2
|
||||
- ``ocrmypdf.exceptions.ExitCode.input_file``
|
||||
- The input file does not seem to be a valid PDF.
|
||||
* - 3
|
||||
- ``ocrmypdf.exceptions.missing_dependency``
|
||||
- An external program required by OCRmyPDF is missing.
|
||||
* - 4
|
||||
- ``ocrmypdf.exceptions.invalid_output_pdf``
|
||||
- An output file was created, but it does not seem to be a valid PDF or
|
||||
PDF/A. The file will be available.
|
||||
* - 5
|
||||
- ``ocrmypdf.exceptions.file_access_error``
|
||||
- The user running OCRmyPDF does not have sufficient permissions to read the input file and write the output file.
|
||||
* - 6
|
||||
- ``ocrmypdf.exceptions.already_done_ocr``
|
||||
- The file already appears to contain text so it may not need OCR. See output message.
|
||||
* - 7
|
||||
- ``ocrmypdf.exceptions.child_process_error``
|
||||
- An error occurred in an external program (child process) and OCRmyPDF cannot continue.
|
||||
* - 8
|
||||
- ``ocrmypdf.exceptions.encrypted_pdf``
|
||||
- The input PDF is encrypted. OCRmyPDF does not read encrypted PDFs. Use another program such as ``qpdf`` to remove encryption.
|
||||
* - 9
|
||||
- ``ocrmypdf.exceptions.invalid_config``
|
||||
- A custom configuration file was forwarded to Tesseract using ``--tesseract-config``, and Tesseract rejected this file.
|
||||
* - 15
|
||||
- ``ocrmypdf.exceptions.other_error``
|
||||
- Some other error occurred.
|
||||
* - 130
|
||||
- ``ocrmypdf.exceptions.ctrl_c``
|
||||
- The program was interrupted by pressing Ctrl+C.
|
||||
|
||||
|
||||
@ -198,4 +198,3 @@ The `Image processing`_ features can improve OCR quality.
|
||||
Rotating pages and deskewing helps to ensure that the page orientation is correct before OCR begins. Removing the background and/or cleaning the page can also improve results. The ``--oversample DPI`` argument can be specified to resample images to higher resolution before attempting OCR; this can improve results as well.
|
||||
|
||||
OCR quality will suffer if the resolution of input images is not correct (since the range of pixel sizes that will be checked for possible fonts will also be incorrect).
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user