mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-01-04 03:01:52 +00:00
README: mention polyglot, fix container vs image
This commit is contained in:
parent
0dc96442d8
commit
93bec22f9c
39
README.rst
39
README.rst
@ -48,22 +48,22 @@ Download OCRmyPDF here: https://github.com/jbarlow83/OCRmyPDF/releases
|
||||
|
||||
You can install it to a Python virtual environment or system-wide.
|
||||
|
||||
Installing the Docker container
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Installing the Docker image
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
For many users, installing the Docker container will be easier than installing all of OCRmyPDF's dependencies. For Windows, it is the only option.
|
||||
For many users, installing the Docker image will be easier than installing all of OCRmyPDF's dependencies. For Windows, it is the only option.
|
||||
|
||||
If you have `Docker <https://docs.docker.com/>`__ installed on your system, you can install
|
||||
a Docker container of the latest release.
|
||||
a Docker image of the latest release.
|
||||
|
||||
Follow the Docker installation instructions for your platform. If you can run this command
|
||||
successfully, your system is ready to download and execute the image::
|
||||
|
||||
docker run hello-world
|
||||
|
||||
OCRmyPDF will use all available CPU cores. By default, the VirtualBox machine instance on Windows and OS X has only a single CPU core enabled. Use the VirtualBox Manager to determine the name of your Docker container host, and then follow these optional steps to enable multiple CPUs::
|
||||
OCRmyPDF will use all available CPU cores. By default, the VirtualBox machine instance on Windows and OS X has only a single CPU core enabled. Use the VirtualBox Manager to determine the name of your Docker engine host, and then follow these optional steps to enable multiple CPUs::
|
||||
|
||||
# Optional
|
||||
# Optional step for Mac OS X users
|
||||
docker-machine stop "yourVM"
|
||||
VBoxManage modifyvm "yourVM" --cpus 2 # or whatever number of core is desired
|
||||
docker-machine start "yourVM"
|
||||
@ -77,8 +77,14 @@ the image::
|
||||
Then tag it to give a more convenient name, just ocrmypdf::
|
||||
|
||||
docker tag jbarlow83/ocrmypdf ocrmypdf
|
||||
|
||||
You can then run using the command::
|
||||
|
||||
This image contains language packs for English, French, Spanish and German. The alternative "polyglot" image provides all available language packs <https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages>`__::
|
||||
|
||||
# Alternative step: If you need all language packs
|
||||
docker pull jbarlow83/ocrmypdf-polyglot
|
||||
docker tag jbarlow83/ocrmypdf-polyglot ocrmypdf
|
||||
|
||||
You can then run ocrmypdf using the command::
|
||||
|
||||
docker run ocrmypdf --help
|
||||
|
||||
@ -90,7 +96,10 @@ In this worked example, the current working directory contains an input file cal
|
||||
|
||||
docker run -v "$(pwd):/home/docker" ocrmypdf --skip-text test.pdf output.pdf
|
||||
|
||||
Note that ``ocrmypdf`` has its own separate ``-v VERBOSITYLEVEL`` argument to control debug verbosity. All Docker arguments should before the ``ocrmypdf`` container name and all arguments to ``ocrmypdf`` should be listed after.
|
||||
Note that ``ocrmypdf`` has its own separate ``-v VERBOSITYLEVEL`` argument to control debug verbosity. All Docker arguments should before the ``ocrmypdf`` image name and all arguments to ``ocrmypdf`` should be listed after.
|
||||
|
||||
The Docker image provides the English, French, German and Spanish language packs. `All other language packs <https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages>`__ are available in the larger image ``jbarlow83/ocrmypdf-polyglot``.
|
||||
|
||||
|
||||
Installing on Mac OS X
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -228,6 +237,18 @@ need to be installed. The script requires specific versions of the
|
||||
dependencies. Older version than the ones mentioned in the release notes
|
||||
are likely not to be compatible to OCRmyPDF.
|
||||
|
||||
Languages
|
||||
---------
|
||||
|
||||
OCRmyPDF uses Tesseract for OCR, and relies on its language packs. For Linux users,
|
||||
you can often find packages that provide language packs::
|
||||
|
||||
# Debian/Ubuntu users
|
||||
sudo apt-get install tesseract-ocr-chi-sim
|
||||
|
||||
You can then pass the ``-l LANG`` argument to OCRmyPDF to give a hint as to what languages it should search for. Multiple
|
||||
languages can be requested.
|
||||
|
||||
Support
|
||||
-------
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user