OCRmyPDF/docs/plugins.rst

141 lines
4.3 KiB
ReStructuredText
Raw Normal View History

=======
2019-06-20 02:44:29 -07:00
Plugins
=======
2020-05-07 03:53:37 -07:00
You can use plugins to customize the behavior of OCRmyPDF at certain points of
interest.
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
Currently, it is possible to:
- add new command line arguments
- override the decision for whether or not to perform OCR on a particular file
- modify the image is about to be sent for OCR
- modify the page image before it is converted to PDF
OCRmyPDF plugins are based on the Python ``pluggy`` package and conform to its
conventions. Note that: plugins installed with as setuptools entrypoints are
not checked currently, because OCRmyPDF assumes you may not want to enable
2020-07-03 16:16:01 -07:00
plugins for all files.
2019-06-20 02:44:29 -07:00
How plugins are imported
========================
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
Plugins are imported on demand, by the OCRmyPDF worker process that needs to use
them. As such, plugins cannot share state with other plugins, cannot rely on
their module's or the interpreter's global state, and should expect asynchronous
copies of themselves to be running. Plugins can write intermediate files to the
folder specified in ``options.work_folder``.
Plugins should work whether executed in threads or processes.
Script plugins
==============
Script plugins may be called from the command line, by specifying the name of a file.
2020-07-03 16:16:01 -07:00
Script plugins may be convenient for informal or "one-off" plugins, when a certain
batch of files needs a special processing step for example.
2020-05-07 03:53:37 -07:00
.. code-block:: bash
2020-07-03 16:16:01 -07:00
ocrmypdf --plugin ocrmypdf_example_plugin.py input.pdf output.pdf
2020-05-07 03:53:37 -07:00
Multiple plugins may be called by issuing the ``--plugin`` argument multiple times.
Packaged plugins
================
Installed plugins may be installed into the same virtual environment as OCRmyPDF
is installed into. They may be invoked using Python standard module naming.
2020-07-03 16:16:01 -07:00
If you are intending to distribute a plugin, please package it.
2020-05-07 03:53:37 -07:00
.. code-block:: bash
ocrmypdf --plugin ocrmypdf_fancypants.pockets.contents input.pdf output.pdf
OCRmyPDF does not automatically import plugins, because the assumption is that
plugins affect different files differently and you may not want them activated
all the time. The command line or ``ocrmypdf.ocr(plugin='...')`` must call
for them.
Third parties that wish to distribute packages for ocrmypdf should package them
as packaged plugins, and these modules should begin with the name ``ocrmypdf_``
similar to ``pytest`` packages such as ``pytest-cov`` (the package) and
``pytest_cov`` (the module).
2020-07-03 16:16:01 -07:00
.. note::
We strongly recommend plugin authors name their plugins with the prefix
``ocrmypdf-`` (for the package name on PyPI) and ``ocrmypdf_`` (for the
module), just like pytest plugins.
2020-05-07 03:53:37 -07:00
Plugin hooks
============
A plugin may provide the following hooks. Hooks should be decorated with
``ocrmypdf.hookimpl``, for example:
.. code-block:: python
from ocrmpydf import hookimpl
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
@hookimpl
2020-06-15 12:51:49 -07:00
def add_options(parser):
2020-05-07 03:53:37 -07:00
pass
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
The following is a complete list of hooks that may be installed and when
they are called.
2019-06-20 02:44:29 -07:00
.. _firstresult:
Note on firstresult hooks
^^^^^^^^^^^^^^^^^^^^^^^^^
If multiple plugins install implementations for this hook, they will be called in
the reverse of the order in which they are installed (i.e., last plugin wins).
When each hook implementation is called in order, the first implementation that
returns a value other than ``None`` will "win" and prevent execution of all other
hooks. As such, you cannot "chain" a series of plugin filters together in this
way. Instead, a single hook implementation should be responsible for any such
chaining operations.
2020-06-30 04:20:14 -07:00
Custom command line arguments
-----------------------------
.. autofunction:: ocrmypdf.pluginspec.add_options
.. autofunction:: ocrmypdf.pluginspec.check_options
Applying special behavior before processing
-------------------------------------------
.. autofunction:: ocrmypdf.pluginspec.validate
PDF page to image
-----------------
.. autofunction:: ocrmypdf.pluginspec.rasterize_pdf_page
Modifying intermediate images
-----------------------------
.. autofunction:: ocrmypdf.pluginspec.filter_ocr_image
.. autofunction:: ocrmypdf.pluginspec.filter_page_image
OCR engine
----------
.. autofunction:: ocrmypdf.pluginspec.get_ocr_engine
.. autoclass:: ocrmypdf.pluginspec.OcrEngine
2020-05-07 03:53:37 -07:00
:members:
2020-06-30 04:20:14 -07:00
.. automethod:: __str__
.. autoclass:: ocrmypdf.pluginspec.OrientationConfidence
PDF/A production
----------------
.. autofunction:: ocrmypdf.pluginspec.generate_pdfa