2019-06-22 17:29:26 -07:00
|
|
|
===================
|
2019-03-01 23:15:32 -08:00
|
|
|
Installing OCRmyPDF
|
|
|
|
===================
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-07-12 01:52:49 -07:00
|
|
|
.. |latest| image:: https://img.shields.io/pypi/v/ocrmypdf.svg
|
2018-07-10 18:20:22 -07:00
|
|
|
:alt: OCRmyPDF latest released version on PyPI
|
|
|
|
|
|
|
|
|latest|
|
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
The easiest way to install OCRmyPDF is to follow the steps for your operating
|
2019-12-19 00:27:37 -08:00
|
|
|
system/platform, although sometimes this version may be out of date. This
|
|
|
|
installation guide provides information allowing you to compare the current
|
|
|
|
version to the one provided by your platform.
|
2018-04-02 11:32:57 -07:00
|
|
|
|
2019-12-19 00:27:37 -08:00
|
|
|
If you want to use the latest version of OCRmyPDF and all of its optional
|
|
|
|
dependencies, the easiest way to get that is install the Homebrew package. Homebrew
|
|
|
|
is best known as a macOS package manger, but also works for
|
|
|
|
`Linux and Windows Subsystem for Linux <https://docs.brew.sh/Homebrew-on-Linux>`__.
|
|
|
|
After Homebrew is installed, simply run ``brew install ocrmypdf``.
|
|
|
|
|
|
|
|
You can also use the more detailed procedures here to manually install OCRmyPDF
|
|
|
|
from source or with the ``pip`` package manager for binary wheels. The reason
|
|
|
|
for these varied steps is that OCRmyPDF requires third-party executables that are
|
|
|
|
not part of Python.
|
2018-04-02 11:32:57 -07:00
|
|
|
|
|
|
|
.. contents:: Platform-specific steps
|
2018-08-03 12:47:25 -07:00
|
|
|
:depth: 2
|
2018-04-02 11:32:57 -07:00
|
|
|
:local:
|
2017-05-12 00:08:22 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Installing on Linux
|
2019-06-22 17:29:26 -07:00
|
|
|
===================
|
2018-08-03 12:47:25 -07:00
|
|
|
|
2019-08-11 18:48:56 -07:00
|
|
|
Debian and Ubuntu 18.04 or newer
|
2019-06-22 17:29:26 -07:00
|
|
|
--------------------------------
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-07-10 18:20:22 -07:00
|
|
|
.. |deb-stable| image:: https://repology.org/badge/version-for-repo/debian_stable/ocrmypdf.svg
|
2018-07-10 12:24:01 -07:00
|
|
|
:alt: Debian 9 stable ("stretch")
|
|
|
|
|
2018-07-10 18:20:22 -07:00
|
|
|
.. |deb-testing| image:: https://repology.org/badge/version-for-repo/debian_testing/ocrmypdf.svg
|
2018-07-10 12:24:01 -07:00
|
|
|
:alt: Debian 10 testing ("buster")
|
|
|
|
|
2018-07-10 18:20:22 -07:00
|
|
|
.. |deb-unstable| image:: https://repology.org/badge/version-for-repo/debian_unstable/ocrmypdf.svg
|
2018-07-10 12:24:01 -07:00
|
|
|
:alt: Debian unstable
|
|
|
|
|
2018-07-10 18:20:22 -07:00
|
|
|
.. |ubu-1804| image:: https://repology.org/badge/version-for-repo/ubuntu_18_04/ocrmypdf.svg
|
2018-07-10 12:24:01 -07:00
|
|
|
:alt: Ubuntu 18.04 LTS
|
|
|
|
|
2018-07-10 18:20:22 -07:00
|
|
|
.. |ubu-1810| image:: https://repology.org/badge/version-for-repo/ubuntu_18_10/ocrmypdf.svg
|
|
|
|
:alt: Ubuntu 18.10
|
|
|
|
|
2019-08-11 18:48:56 -07:00
|
|
|
.. |ubu-1904| image:: https://repology.org/badge/version-for-repo/ubuntu_19_04/ocrmypdf.svg
|
|
|
|
:alt: Ubuntu 19.04
|
|
|
|
|
|
|
|
.. |ubu-1910| image:: https://repology.org/badge/version-for-repo/ubuntu_19_10/ocrmypdf.svg
|
|
|
|
:alt: Ubuntu 19.10
|
2018-07-10 18:20:22 -07:00
|
|
|
|
2019-08-11 18:48:56 -07:00
|
|
|
+-----------------------------------------------+
|
|
|
|
| **OCRmyPDF versions in Debian & Ubuntu** |
|
|
|
|
+-----------------------------------------------+
|
|
|
|
| |latest| |
|
|
|
|
+-----------------------------------------------+
|
|
|
|
| |deb-stable| |deb-testing| |deb-unstable| |
|
|
|
|
+-----------------------------------------------+
|
|
|
|
| |ubu-1804| |ubu-1810| |ubu-1904| |ubu-1910| |
|
|
|
|
+-----------------------------------------------+
|
2018-07-10 12:24:01 -07:00
|
|
|
|
2019-12-19 00:27:37 -08:00
|
|
|
Users of Debian 9 ("stretch") or later or Ubuntu 18.04 or later, including users
|
|
|
|
of Windows Subsystem for Linux, may simply
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
apt-get install ocrmypdf
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
As indicated in the table above, Debian and Ubuntu releases may lag
|
|
|
|
behind the latest version. If the version available for your platform is
|
|
|
|
out of date, you could opt to install the latest version from source.
|
|
|
|
See `Installing HEAD revision from
|
2019-08-11 18:48:56 -07:00
|
|
|
sources <#installing-head-revision-from-sources>`__. Ubuntu 16.10 to 17.10
|
|
|
|
inclusive also had ocrmypdf, but these versions are end of life.
|
2018-07-10 18:20:22 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
For full details on version availability for your platform, check the
|
|
|
|
`Debian Package Tracker <https://tracker.debian.org/pkg/ocrmypdf>`__ or
|
|
|
|
`Ubuntu launchpad.net <https://launchpad.net/ocrmypdf>`__.
|
2018-07-10 18:20:22 -07:00
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
OCRmyPDF for Debian and Ubuntu currently omit the JBIG2 encoder.
|
|
|
|
OCRmyPDF works fine without it but will produce larger output files.
|
|
|
|
If you build jbig2enc from source, ocrmypdf 7.0.0 and later will
|
|
|
|
automatically detect it (specifically the ``jbig2`` binary) on the
|
|
|
|
``PATH``. To add JBIG2 encoding, see :ref:`jbig2`.
|
2018-07-10 18:20:22 -07:00
|
|
|
|
2018-10-14 16:28:50 -04:00
|
|
|
Fedora 29 or newer
|
2019-06-22 17:29:26 -07:00
|
|
|
------------------
|
2018-10-14 16:28:50 -04:00
|
|
|
|
2019-08-11 18:48:56 -07:00
|
|
|
.. |fedora-29| image:: https://repology.org/badge/version-for-repo/fedora_29/ocrmypdf.svg
|
2018-10-14 16:28:50 -04:00
|
|
|
:alt: Fedora 29
|
|
|
|
|
2019-08-11 18:48:56 -07:00
|
|
|
.. |fedora-30| image:: https://repology.org/badge/version-for-repo/fedora_30/ocrmypdf.svg
|
|
|
|
:alt: Fedora 30
|
|
|
|
|
2018-10-14 16:28:50 -04:00
|
|
|
.. |fedora-rawhide| image:: https://repology.org/badge/version-for-repo/fedora_rawhide/ocrmypdf.svg
|
|
|
|
:alt: Fedore Rawhide
|
|
|
|
|
2019-08-11 18:48:56 -07:00
|
|
|
+-----------------------------------------------+
|
|
|
|
| **OCRmyPDF version** |
|
|
|
|
+-----------------------------------------------+
|
|
|
|
| |latest| |
|
|
|
|
+-----------------------------------------------+
|
|
|
|
| |fedora-29| |fedora-30| |fedora-rawhide| |
|
|
|
|
+-----------------------------------------------+
|
2018-10-14 16:28:50 -04:00
|
|
|
|
|
|
|
Users of Fedora 29 later may simply
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
dnf install ocrmypdf
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
For full details on version availability, check the `Fedora Package
|
|
|
|
Tracker <https://apps.fedoraproject.org/packages/ocrmypdf>`__.
|
2018-10-14 16:28:50 -04:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
If the version available for your platform is out of date, you could opt
|
|
|
|
to install the latest version from source. See `Installing HEAD revision
|
|
|
|
from sources <#installing-head-revision-from-sources>`__.
|
2018-10-14 16:28:50 -04:00
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
OCRmyPDF for Fedora currently omits the JBIG2 encoder due to patent
|
|
|
|
issues. OCRmyPDF works fine without it but will produce larger output
|
|
|
|
files. If you build jbig2enc from source, ocrmypdf 7.0.0 and later
|
|
|
|
will automatically detect it on the ``PATH``. To add JBIG2 encoding,
|
|
|
|
see `Installing the JBIG2 encoder <jbig2>`__.
|
2018-10-14 16:28:50 -04:00
|
|
|
|
2019-07-13 02:06:11 -07:00
|
|
|
.. _ubuntu-lts-latest:
|
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Installing the latest version on Ubuntu 18.04 LTS
|
2019-06-22 17:29:26 -07:00
|
|
|
-------------------------------------------------
|
2018-04-02 11:32:57 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Ubuntu 18.04 includes ocrmypdf 6.1.2. To install a more recent version,
|
|
|
|
first install the system version to get most of the dependencies:
|
2018-04-02 11:32:57 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. code-block:: bash
|
2017-03-13 22:25:25 -07:00
|
|
|
|
2020-02-09 23:48:53 -08:00
|
|
|
sudo apt-get -y update
|
|
|
|
sudo apt-get -y install \
|
|
|
|
ghostscript \
|
|
|
|
icc-profiles-free \
|
|
|
|
liblept5 \
|
2019-05-17 14:25:17 -07:00
|
|
|
libxml2 \
|
2020-02-09 23:48:53 -08:00
|
|
|
pngquant \
|
|
|
|
python3-cffi \
|
|
|
|
python3-distutils \
|
|
|
|
python3-pkg-resources \
|
|
|
|
python3-reportlab \
|
|
|
|
qpdf \
|
|
|
|
tesseract-ocr \
|
|
|
|
zlib1g
|
2017-03-13 22:25:25 -07:00
|
|
|
|
2019-11-12 15:01:15 -08:00
|
|
|
We will need a newer version of ``pip`` then was available for Ubuntu 18.04:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Then install the most recent ocrmypdf for the local user and set the
|
|
|
|
user's ``PATH`` to check for the user's Python packages.
|
2018-02-15 17:42:16 -08:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. code-block:: bash
|
2017-05-19 12:18:09 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
export PATH=$HOME/.local/bin:$PATH
|
2019-11-11 22:22:30 -08:00
|
|
|
python3 -m pip install --user ocrmypdf
|
2017-03-13 22:25:25 -07:00
|
|
|
|
2018-08-27 01:25:30 -07:00
|
|
|
To add JBIG2 encoding, see :ref:`jbig2`.
|
2017-03-13 22:25:25 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Ubuntu 16.04 LTS
|
2019-06-22 17:29:26 -07:00
|
|
|
----------------
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
No package is available for Ubuntu 16.04. OCRmyPDF 8.0 and newer require
|
|
|
|
Python 3.6. Ubuntu 16.04 ships Python 3.5, but you can install Python
|
|
|
|
3.6 on it. Or, you can skip Python 3.6 and install OCRmyPDF 7.x or older
|
|
|
|
- for that procedure, please see the installation documentation for the
|
|
|
|
version of OCRmyPDF you plan to use.
|
2019-01-12 00:33:36 -08:00
|
|
|
|
|
|
|
**Install system packages for OCRmyPDF**
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. code-block:: bash
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
sudo apt-get update
|
2019-01-12 00:33:36 -08:00
|
|
|
sudo apt-get install -y software-properties-common python-software-properties
|
|
|
|
sudo add-apt-repository -y \
|
|
|
|
ppa:jonathonf/python-3.6 \
|
|
|
|
ppa:alex-p/tesseract-ocr
|
|
|
|
sudo apt-get update
|
|
|
|
sudo apt-get install -y \
|
2018-08-03 12:47:25 -07:00
|
|
|
ghostscript \
|
|
|
|
libexempi3 \
|
2019-01-12 00:33:36 -08:00
|
|
|
libffi6 \
|
2018-08-03 12:47:25 -07:00
|
|
|
pngquant \
|
2019-01-12 00:33:36 -08:00
|
|
|
python3.6 \
|
2018-08-03 12:47:25 -07:00
|
|
|
qpdf \
|
|
|
|
tesseract-ocr \
|
|
|
|
unpaper
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
This will install a Python 3.6 binary at ``/usr/bin/python3.6``
|
|
|
|
alongside the system's Python 3.5. Do not remove the system Python. This
|
|
|
|
will also install Tesseract 4.0 from a PPA, since the version available
|
|
|
|
in Ubuntu 16.04 is too old for OCRmyPDF.
|
2019-01-12 00:33:36 -08:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Now install pip for Python 3.6. This will install the Python 3.6 version
|
|
|
|
of ``pip`` at ``/usr/local/bin/pip``.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
2017-05-16 23:26:23 -07:00
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6
|
|
|
|
|
|
|
|
**Install OCRmyPDF**
|
2018-06-13 01:02:53 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
OCRmyPDF requires the locale to be set for UTF-8. **On some minimal
|
2020-02-12 00:07:24 -08:00
|
|
|
Ubuntu installations**, such as the Ubuntu 16.04 Docker images it may be
|
|
|
|
necessary to set the locale.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
# Optional: Only need to set these if they are not already set
|
|
|
|
export LC_ALL=C.UTF-8
|
|
|
|
export LANG=C.UTF-8
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
Now install OCRmyPDF for the current user, and ensure that the ``PATH``
|
|
|
|
environment variable contains ``$HOME/.local/bin``.
|
2017-01-29 18:26:52 -08:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. code-block:: bash
|
2018-04-02 11:32:57 -07:00
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
export PATH=$HOME/.local/bin:$PATH
|
|
|
|
pip3 install --user ocrmypdf
|
2017-01-29 18:26:52 -08:00
|
|
|
|
2018-08-27 01:25:30 -07:00
|
|
|
To add JBIG2 encoding, see :ref:`jbig2`.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Ubuntu 14.04 LTS
|
2019-06-22 17:29:26 -07:00
|
|
|
----------------
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Installing on Ubuntu 14.04 LTS (trusty) is more difficult than some
|
|
|
|
other options, because of its age. Several backports are required. For
|
|
|
|
explanations of some steps of this procedure, see the similar steps for
|
|
|
|
Ubuntu 16.04.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Install system dependencies:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. code-block:: bash
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
sudo apt-get update
|
2018-08-03 12:47:25 -07:00
|
|
|
sudo apt-get install \
|
|
|
|
software-properties-common python-software-properties \
|
|
|
|
zlib1g-dev \
|
|
|
|
libexempi3 \
|
|
|
|
libjpeg-dev \
|
|
|
|
libffi-dev \
|
|
|
|
pngquant \
|
|
|
|
qpdf
|
|
|
|
|
2019-01-12 00:33:36 -08:00
|
|
|
We will need backports of Ghostscript 9.16, libav-11 (for unpaper 6.1),
|
2019-06-22 17:29:26 -07:00
|
|
|
Tesseract 4.00 (alpha), and Python 3.6. This will replace Ghostscript
|
|
|
|
and Tesseract 3.x on your system. Python 3.6 will be installed alongside
|
|
|
|
the system Python 3.4.
|
2018-08-03 12:47:25 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
If you prefer to not modify your system in this matter, consider using a
|
|
|
|
Docker container.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
sudo add-apt-repository ppa:vshn/ghostscript -y
|
|
|
|
sudo add-apt-repository ppa:heyarje/libav-11 -y
|
|
|
|
sudo add-apt-repository ppa:alex-p/tesseract-ocr -y
|
|
|
|
sudo add-apt-repository ppa:jonathonf/python-3.6 -y
|
2018-06-13 01:02:53 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
sudo apt-get update
|
|
|
|
|
|
|
|
sudo apt-get install \
|
|
|
|
python3.6-dev \
|
|
|
|
ghostscript \
|
|
|
|
tesseract-ocr \
|
|
|
|
tesseract-ocr-eng \
|
|
|
|
libavformat56 libavcodec56 libavutil54 \
|
|
|
|
wget
|
|
|
|
|
|
|
|
Now we need to install ``pip`` and let it install ocrmypdf:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
curl https://bootstrap.pypa.io/ez_setup.py -o - | python3.6 && python3.6 -m easy_install pip
|
|
|
|
pip3.6 install ocrmypdf
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-11-04 00:09:04 -08:00
|
|
|
The optional dependency ``unpaper`` is only available at 0.4.2 in Ubuntu 14.04,
|
|
|
|
and no backports are available. Previously the author maintained a backported
|
|
|
|
.deb package for unpaper 6.1, but since Ubuntu 14.04 is now end of life, this is
|
|
|
|
not supported. As such, ``unpaper`` is not available on Ubuntu 14.04 or must by
|
|
|
|
compiled by hand.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-27 01:25:30 -07:00
|
|
|
To add JBIG2 encoding, see :ref:`jbig2`.
|
2017-01-02 18:17:38 -08:00
|
|
|
|
2019-01-18 05:29:37 -08:00
|
|
|
ArchLinux (AUR)
|
2019-06-22 17:29:26 -07:00
|
|
|
---------------
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. image:: https://repology.org/badge/version-for-repo/aur/ocrmypdf.svg
|
|
|
|
:alt: ArchLinux
|
|
|
|
:target: https://repology.org/metapackage/ocrmypdf
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
There is an `ArchLinux User Repository package for
|
2020-02-18 02:08:58 -08:00
|
|
|
ocrmypdf <https://aur.archlinux.org/packages/ocrmypdf/>`__. If you have any
|
|
|
|
idea how to actually install the package, please feel free to contribute
|
|
|
|
appropriate instructions, as this author is completely mystified by ArchLinux.
|
2018-08-31 16:30:18 -03:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
If you have any difficulties with installation, check the repository
|
|
|
|
package page.
|
2017-10-18 12:37:27 -07:00
|
|
|
|
2019-12-19 00:27:37 -08:00
|
|
|
Alpine Linux
|
|
|
|
------------
|
|
|
|
|
|
|
|
.. image:: https://repology.org/badge/version-for-repo/alpine_edge/ocrmypdf.svg
|
|
|
|
:alt: Alpine Linux
|
|
|
|
:target: https://repology.org/metapackage/ocrmypdf
|
|
|
|
|
|
|
|
To install OCRmyPDF for Alpine Linux:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
apk add ocrmypdf
|
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Other Linux packages
|
2019-06-22 17:29:26 -07:00
|
|
|
--------------------
|
2017-01-30 15:08:02 -08:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
See the
|
|
|
|
`Repology <https://repology.org/metapackage/ocrmypdf/versions>`__ page.
|
2017-01-30 15:08:02 -08:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
In general, first install the OCRmyPDF package for your system, then
|
|
|
|
optionally use the procedure `Installing with Python
|
|
|
|
pip <#installing-with-python-pip>`__ to install a more recent version.
|
2017-01-30 15:08:02 -08:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Installing on macOS
|
2019-06-22 17:29:26 -07:00
|
|
|
===================
|
2017-01-30 15:08:02 -08:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Homebrew
|
2019-06-22 17:29:26 -07:00
|
|
|
--------
|
2017-01-30 15:08:02 -08:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
.. image:: https://img.shields.io/homebrew/v/ocrmypdf.svg
|
|
|
|
:alt: homebrew
|
|
|
|
:target: http://brewformulas.org/Ocrmypdf
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
OCRmyPDF is now a standard `Homebrew <https://brew.sh>`__ formula. To
|
|
|
|
install on macOS:
|
2018-08-03 12:47:25 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
brew install ocrmypdf
|
2017-01-30 15:08:02 -08:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
This will include only the English language pack. If you need other
|
|
|
|
languages you can optionally install them all:
|
2019-02-19 19:13:36 +01:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
brew install tesseract-lang # Optional: Install all language packs
|
|
|
|
|
2017-11-21 16:50:14 -08:00
|
|
|
.. note::
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Users who previously installed OCRmyPDF on macOS using
|
|
|
|
``pip install ocrmypdf`` should remove the pip version
|
|
|
|
(``pip3 uninstall ocrmypdf``) before switching to the Homebrew
|
|
|
|
version.
|
2018-08-03 12:47:25 -07:00
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Users who previously installed OCRmyPDF from the private tap should
|
|
|
|
switch to the mainline version (``brew untap jbarlow83/ocrmypdf``)
|
|
|
|
and install from there.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2017-03-13 22:25:25 -07:00
|
|
|
Manual installation on macOS
|
2019-06-22 17:29:26 -07:00
|
|
|
----------------------------
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2017-01-29 18:26:52 -08:00
|
|
|
These instructions probably work on all macOS supported by Homebrew.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
If it's not already present, `install Homebrew <http://brew.sh/>`__.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
Update Homebrew:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
brew update
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Install or upgrade the required Homebrew packages, if any are missing.
|
|
|
|
To do this, download the ``Brewfile`` that lists all of the dependencies
|
|
|
|
to the current directory, and run ``brew bundle`` to process them
|
|
|
|
(installing or upgrading as needed). ``Brewfile`` is a plain text file.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2018-07-10 12:24:01 -07:00
|
|
|
wget https://github.com/jbarlow83/OCRmyPDF/raw/master/.travis/Brewfile
|
|
|
|
brew bundle
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
This will include the English, French, German and Spanish language
|
|
|
|
packs. If you need other languages you can optionally install them all:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. _macos-all-languages:
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
.. code-block:: bash
|
2017-05-12 00:12:06 -07:00
|
|
|
|
2019-12-29 02:29:52 -08:00
|
|
|
brew install tesseract-lang # Option 2: for all language packs
|
2017-05-12 00:12:06 -07:00
|
|
|
|
2018-04-01 13:19:57 -07:00
|
|
|
Update the homebrew pip:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
pip3 install --upgrade pip
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2017-11-21 16:50:14 -08:00
|
|
|
You can then install OCRmyPDF from PyPI, for the current user:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2018-06-13 01:02:53 -07:00
|
|
|
pip3 install --user ocrmypdf
|
2017-11-21 16:50:14 -08:00
|
|
|
|
|
|
|
or system-wide:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2018-06-13 01:02:53 -07:00
|
|
|
pip3 install ocrmypdf
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
The command line program should now be available:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
ocrmypdf --help
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-05-14 02:13:56 -07:00
|
|
|
Installing on FreeBSD
|
2019-06-22 17:29:26 -07:00
|
|
|
=====================
|
2019-05-14 02:13:56 -07:00
|
|
|
|
2019-08-11 15:49:28 -07:00
|
|
|
.. image:: https://repology.org/badge/version-for-repo/freebsd/python:ocrmypdf.svg
|
|
|
|
:alt: FreeBSD
|
|
|
|
:target: https://repology.org/project/python:ocrmypdf/versions
|
|
|
|
|
|
|
|
FreeBSD 11.2, 11.3, 12.0-RELEASE and 13.0-CURRENT are supported. Other
|
|
|
|
versions likely work but have not been tested.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2019-05-14 02:13:56 -07:00
|
|
|
|
2019-08-11 15:49:28 -07:00
|
|
|
pkg install py36-ocrmypdf
|
2019-05-14 02:13:56 -07:00
|
|
|
|
2019-08-11 15:49:28 -07:00
|
|
|
To install a more recent version, you could attempt to first install the system
|
|
|
|
version with ``pkg``, then use ``pip install --user ocrmypdf``.
|
2019-05-14 02:13:56 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
Installing the Docker image
|
2019-06-22 17:29:26 -07:00
|
|
|
===========================
|
2018-07-10 12:24:01 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
For some users, installing the Docker image will be easier than
|
2019-12-19 00:27:37 -08:00
|
|
|
installing all of OCRmyPDF's dependencies.
|
2018-07-10 12:24:01 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
See `OCRmyPDF Docker Image <docker>`__ for more information.
|
2017-05-12 00:08:22 -07:00
|
|
|
|
2016-09-06 13:52:40 -07:00
|
|
|
Installing on Windows
|
2019-06-22 17:29:26 -07:00
|
|
|
=====================
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-12-06 15:03:20 -08:00
|
|
|
.. warning::
|
|
|
|
|
|
|
|
Native Windows support is new. Consider it "beta" software. Some
|
|
|
|
functionality is missing or may be more difficult to enable. If you need a
|
|
|
|
production-ready solution, use Windows Subsystem for Linux or a Docker
|
|
|
|
image.
|
|
|
|
|
2019-12-11 13:13:51 -08:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
Administrator privileges will be required for some of these steps.
|
|
|
|
|
2019-12-09 13:08:17 -08:00
|
|
|
You must install the following for Windows:
|
2019-11-19 12:52:48 -08:00
|
|
|
|
2019-12-19 00:27:37 -08:00
|
|
|
* Python 3.7 (64-bit)
|
2019-11-19 12:52:48 -08:00
|
|
|
* Tesseract 4.0 or later
|
|
|
|
* Ghostscript 9.50 or later
|
|
|
|
|
2019-12-06 15:03:20 -08:00
|
|
|
You can install these with the Chocolatey package manager:
|
2019-11-19 12:52:48 -08:00
|
|
|
|
|
|
|
* ``choco install python3``
|
2019-12-09 13:08:17 -08:00
|
|
|
* ``choco install --pre tesseract``
|
2019-11-19 12:52:48 -08:00
|
|
|
* ``choco install ghostscript``
|
|
|
|
|
2019-12-06 15:03:20 -08:00
|
|
|
Also consider adding:
|
|
|
|
|
|
|
|
* ``choco install pngquant``
|
|
|
|
|
2019-12-09 13:08:17 -08:00
|
|
|
Windows 10 64-bit and 64-bit versions of applications are recommended. Earlier
|
2019-12-19 00:27:37 -08:00
|
|
|
versions of Windows and 32-bit versions of these programs are not tested, and not
|
|
|
|
supported at this time.
|
2019-12-09 13:08:17 -08:00
|
|
|
|
2019-12-19 12:11:32 -08:00
|
|
|
OCRmyPDF will check for Tesseract-OCR and Ghostscript in your Program Files folder.
|
|
|
|
If they are in some other location, you may need to modify the ``PATH``
|
|
|
|
environment variable so Tesseract, Ghostscript, and other any optional executables can
|
|
|
|
be found. You can enter it in the command line or
|
|
|
|
`follow these directions <https://www.computerhope.com/issues/ch000549.htm#dospath>`_
|
2019-12-11 13:13:51 -08:00
|
|
|
to make the change persistent and system-wide.
|
|
|
|
|
|
|
|
You may then use pip to install ocrmypdf:
|
|
|
|
|
|
|
|
* ``pip install ocrmypdf``
|
2019-11-19 12:52:48 -08:00
|
|
|
|
|
|
|
Installing on Windows Subsystem for Linux
|
|
|
|
=========================================
|
2019-07-13 02:06:11 -07:00
|
|
|
|
|
|
|
#. Install Ubuntu 18.04 for Windows Subsystem for Linux, if not already installed.
|
|
|
|
#. Follow the procedure to install :ref:`OCRmyPDF on Ubuntu 18.04 <ubuntu-lts-latest>`.
|
|
|
|
#. Open the Windows command prompt and create a symlink:
|
|
|
|
|
|
|
|
.. code-block:: powershell
|
|
|
|
|
|
|
|
wsl sudo ln -s /home/user/.local/bin/ocrmypdf /usr/local/bin/ocrmypdf
|
|
|
|
|
|
|
|
Then confirm that the expected version from PyPI (|latest|) is installed:
|
2019-07-03 00:48:34 -07:00
|
|
|
|
2019-07-13 02:06:11 -07:00
|
|
|
.. code-block:: powershell
|
|
|
|
|
|
|
|
wsl ocrmypdf --version
|
|
|
|
|
|
|
|
You can then run OCRmyPDF in the Windows command prompt or Powershell, prefixing
|
|
|
|
``wsl``, and call it from Windows programs or batch files.
|
|
|
|
|
|
|
|
Docker
|
|
|
|
^^^^^^
|
|
|
|
|
|
|
|
You can also :ref:`Install the Docker <docker-install>` container on Windows. Ensure that
|
|
|
|
your command prompt can run the docker "hello world" container.
|
|
|
|
|
2018-04-11 15:22:39 -07:00
|
|
|
Installing with Python pip
|
2019-06-22 17:29:26 -07:00
|
|
|
==========================
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
OCRmyPDF is delivered by PyPI because it is a convenient way to install
|
|
|
|
the latest version. However, PyPI and ``pip`` cannot address the fact
|
|
|
|
that ``ocrmypdf`` depends on certain non-Python system libraries and
|
|
|
|
programs being instsalled.
|
2018-08-03 12:47:25 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
For best results, first install `your platform's
|
|
|
|
version <https://repology.org/metapackage/ocrmypdf/versions>`__ of
|
|
|
|
``ocrmypdf``, using the instructions elsewhere in this document. Then
|
|
|
|
you can use ``pip`` to get the latest version if your platform version
|
|
|
|
is out of date. Chances are that this will satisfy most dependencies.
|
2018-08-03 12:47:25 -07:00
|
|
|
|
|
|
|
Use ``ocrmypdf --version`` to confirm what version was installed.
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Then you can install the latest OCRmyPDF from the Python wheels. First
|
|
|
|
try:
|
2018-04-11 15:22:39 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
pip3 install --user ocrmypdf
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
You should then be able to run ``ocrmypdf --version`` and see that the
|
|
|
|
latest version was located.
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Since ``pip3 install --user`` does not work correctly on some platforms,
|
|
|
|
notably Ubuntu 16.04 and older, and the Homebrew version of Python,
|
|
|
|
instead use this for a system wide installation:
|
2018-04-11 15:22:39 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
pip3 install ocrmypdf
|
|
|
|
|
|
|
|
Requirements for pip and HEAD install
|
2019-06-22 17:29:26 -07:00
|
|
|
-------------------------------------
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
OCRmyPDF currently requires these external programs and libraries to be
|
|
|
|
installed, and must be satisfied using the operating system package
|
|
|
|
manager. ``pip`` cannot provide them.
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
- Python 3.6 or newer
|
|
|
|
- Ghostscript 9.15 or newer
|
|
|
|
- qpdf 8.1.0 or newer
|
2019-08-31 01:24:31 -07:00
|
|
|
- Tesseract 4.0.0-beta or newer
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2018-10-12 21:29:27 -07:00
|
|
|
As of ocrmypdf 7.2.1, the following versions are recommended:
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2019-10-20 03:20:54 -07:00
|
|
|
- Python 3.7 or 3.8
|
2019-06-22 17:29:26 -07:00
|
|
|
- Ghostscript 9.23 or newer
|
|
|
|
- qpdf 8.2.1
|
|
|
|
- Tesseract 4.0.0 or newer
|
|
|
|
- jbig2enc 0.29 or newer
|
|
|
|
- pngquant 2.5 or newer
|
|
|
|
- unpaper 6.1
|
|
|
|
|
|
|
|
jbig2enc, pngquant, and unpaper are optional. If missing certain
|
|
|
|
features are disabled. OCRmyPDF will discover them as soon as they are
|
|
|
|
available.
|
|
|
|
|
|
|
|
**jbig2enc**, if present, will be used to optimize the encoding of
|
|
|
|
monochrome images. This can significantly reduce the file size of the
|
|
|
|
output file. It is not required.
|
|
|
|
`jbig2enc <https://github.com/agl/jbig2enc>`__ is not generally
|
|
|
|
available for Ubuntu or Debian due to lingering concerns about patent
|
|
|
|
issues, but can easily be built from source. To add JBIG2 encoding, see
|
|
|
|
:ref:`jbig2`.
|
|
|
|
|
|
|
|
**pngquant**, if present, is optionally used to optimize the encoding of
|
|
|
|
PNG-style images in PDFs (actually, any that are that losslessly
|
|
|
|
encoded) by lossily quantizing to a smaller color palette. It is only
|
|
|
|
activated then the ``--optimize`` argument is ``2`` or ``3``.
|
|
|
|
|
|
|
|
**unpaper**, if present, enables the ``--clean`` and ``--clean-final``
|
|
|
|
command line options.
|
|
|
|
|
|
|
|
These are in addition to the Python packaging dependencies, meaning that
|
|
|
|
unfortunately, the ``pip install`` command cannot satisfy all of them.
|
2018-04-11 15:22:39 -07:00
|
|
|
|
2016-09-06 13:52:40 -07:00
|
|
|
Installing HEAD revision from sources
|
2019-06-22 17:29:26 -07:00
|
|
|
=====================================
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
If you have ``git`` and Python 3.6 or newer installed, you can install
|
|
|
|
from source. When the ``pip`` installer runs, it will alert you if
|
|
|
|
dependencies are missing.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
If you prefer to build every from source, you will need to `build
|
|
|
|
pikepdf from
|
|
|
|
source <https://pikepdf.readthedocs.io/en/latest/installation.html#building-from-source>`__.
|
|
|
|
First ensure you can build and install pikepdf.
|
2018-08-22 03:19:46 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
To install the HEAD revision from sources in the current Python 3
|
|
|
|
environment:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
pip3 install git+https://github.com/jbarlow83/OCRmyPDF.git
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
Or, to install in `development
|
|
|
|
mode <https://pythonhosted.org/setuptools/setuptools.html#development-mode>`__,
|
|
|
|
allowing customization of OCRmyPDF, use the ``-e`` flag:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
pip3 install -e git+https://github.com/jbarlow83/OCRmyPDF.git
|
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
You may find it easiest to install in a virtual environment, rather than
|
|
|
|
system-wide:
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2017-05-12 00:12:06 -07:00
|
|
|
git clone -b master https://github.com/jbarlow83/OCRmyPDF.git
|
|
|
|
python3 -m venv
|
|
|
|
source venv/bin/activate
|
|
|
|
cd OCRmyPDF
|
|
|
|
pip3 install .
|
2016-09-06 13:52:40 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
However, ``ocrmypdf`` will only be accessible on the system PATH when
|
|
|
|
you activate the virtual environment.
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
To run the program:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
2017-05-12 00:12:06 -07:00
|
|
|
|
|
|
|
ocrmypdf --help
|
2016-09-06 13:52:40 -07:00
|
|
|
|
|
|
|
If not yet installed, the script will notify you about dependencies that
|
|
|
|
need to be installed. The script requires specific versions of the
|
|
|
|
dependencies. Older version than the ones mentioned in the release notes
|
|
|
|
are likely not to be compatible to OCRmyPDF.
|
2018-04-05 02:15:01 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
For development
|
2019-06-22 17:29:26 -07:00
|
|
|
---------------
|
2018-04-05 02:15:01 -07:00
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
To install all of the development and test requirements:
|
2018-07-10 12:24:01 -07:00
|
|
|
|
2018-07-10 18:20:22 -07:00
|
|
|
.. code-block:: bash
|
|
|
|
|
2018-08-03 12:47:25 -07:00
|
|
|
git clone -b master https://github.com/jbarlow83/OCRmyPDF.git
|
|
|
|
python3 -m venv
|
|
|
|
source venv/bin/activate
|
|
|
|
cd OCRmyPDF
|
|
|
|
pip install -e .
|
2018-10-10 23:53:08 -07:00
|
|
|
pip install -r requirements/dev.txt -r requirements/test.txt
|
2018-07-10 12:24:01 -07:00
|
|
|
|
2018-08-27 01:25:30 -07:00
|
|
|
To add JBIG2 encoding, see :ref:`jbig2`.
|
2019-05-06 18:07:41 -07:00
|
|
|
|
|
|
|
Shell completions
|
2019-06-22 17:29:26 -07:00
|
|
|
=================
|
2019-05-06 18:07:41 -07:00
|
|
|
|
2019-05-13 00:22:52 -07:00
|
|
|
Completions for ``bash`` and ``fish`` are available in the project's
|
|
|
|
``misc/completion`` folder. The ``bash`` completions are likely ``zsh``
|
2019-06-22 17:29:26 -07:00
|
|
|
compatible but this has not been confirmed. Package maintainers, please
|
|
|
|
install these at the appropriate locations for your system.
|
2019-05-06 18:07:41 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
To manually install the ``bash`` completion, copy
|
|
|
|
``misc/completion/ocrmypdf.bash`` to ``/etc/bash_completion.d/ocrmypdf``
|
|
|
|
(rename the file).
|
2019-05-06 18:07:41 -07:00
|
|
|
|
2019-06-22 17:29:26 -07:00
|
|
|
To manually install the ``fish`` completion, copy
|
|
|
|
``misc/completion/ocrmypdf.fish`` to
|
2019-05-06 18:07:41 -07:00
|
|
|
``~/.config/fish/completions/ocrmypdf.fish``.
|