OCRmyPDF/docs/plugins.rst

119 lines
3.4 KiB
ReStructuredText
Raw Normal View History

=======
2019-06-20 02:44:29 -07:00
Plugins
=======
2020-05-07 03:53:37 -07:00
You can use plugins to customize the behavior of OCRmyPDF at certain points of
interest.
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
Currently, it is possible to:
- add new command line arguments
- override the decision for whether or not to perform OCR on a particular file
- modify the image is about to be sent for OCR
- modify the page image before it is converted to PDF
OCRmyPDF plugins are based on the Python ``pluggy`` package and conform to its
conventions. Note that: plugins installed with as setuptools entrypoints are
not checked currently, because OCRmyPDF assumes you may not want to enable
plugins for all files. Also, plugins must be functions, not classes.
2019-06-20 02:44:29 -07:00
How plugins are imported
========================
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
Plugins are imported on demand, by the OCRmyPDF worker process that needs to use
them. As such, plugins cannot share state with other plugins, cannot rely on
their module's or the interpreter's global state, and should expect asynchronous
copies of themselves to be running. Plugins can write intermediate files to the
folder specified in ``options.work_folder``.
Plugins should work whether executed in threads or processes.
Script plugins
==============
Script plugins may be called from the command line, by specifying the name of a file.
.. code-block:: bash
ocrmypdf --plugin example_plugin.py input.pdf output.pdf
Multiple plugins may be called by issuing the ``--plugin`` argument multiple times.
Packaged plugins
================
Installed plugins may be installed into the same virtual environment as OCRmyPDF
is installed into. They may be invoked using Python standard module naming.
.. code-block:: bash
ocrmypdf --plugin ocrmypdf_fancypants.pockets.contents input.pdf output.pdf
OCRmyPDF does not automatically import plugins, because the assumption is that
plugins affect different files differently and you may not want them activated
all the time. The command line or ``ocrmypdf.ocr(plugin='...')`` must call
for them.
Third parties that wish to distribute packages for ocrmypdf should package them
as packaged plugins, and these modules should begin with the name ``ocrmypdf_``
similar to ``pytest`` packages such as ``pytest-cov`` (the package) and
``pytest_cov`` (the module).
Plugin hooks
============
A plugin may provide the following hooks. Hooks should be decorated with
``ocrmypdf.hookimpl``, for example:
.. code-block:: python
from ocrmpydf import hookimpl
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
@hookimpl
2020-06-15 12:51:49 -07:00
def add_options(parser):
2020-05-07 03:53:37 -07:00
pass
2019-06-20 02:44:29 -07:00
2020-05-07 03:53:37 -07:00
The following is a complete list of hooks that may be installed and when
they are called.
2019-06-20 02:44:29 -07:00
2020-06-30 04:20:14 -07:00
Custom command line arguments
-----------------------------
.. autofunction:: ocrmypdf.pluginspec.add_options
.. autofunction:: ocrmypdf.pluginspec.check_options
Applying special behavior before processing
-------------------------------------------
.. autofunction:: ocrmypdf.pluginspec.validate
PDF page to image
-----------------
.. autofunction:: ocrmypdf.pluginspec.rasterize_pdf_page
Modifying intermediate images
-----------------------------
.. autofunction:: ocrmypdf.pluginspec.filter_ocr_image
.. autofunction:: ocrmypdf.pluginspec.filter_page_image
OCR engine
----------
.. autofunction:: ocrmypdf.pluginspec.get_ocr_engine
.. autoclass:: ocrmypdf.pluginspec.OcrEngine
2020-05-07 03:53:37 -07:00
:members:
2020-06-30 04:20:14 -07:00
.. automethod:: __str__
.. autoclass:: ocrmypdf.pluginspec.OrientationConfidence
PDF/A production
----------------
.. autofunction:: ocrmypdf.pluginspec.generate_pdfa