2022-07-28 01:06:46 -07:00
|
|
|
.. SPDX-FileCopyrightText: 2022 James R. Barlow
|
|
|
|
..
|
|
|
|
.. SPDX-License-Identifier: CC-BY-SA-4.0
|
|
|
|
|
2019-11-08 03:22:28 -08:00
|
|
|
================
|
|
|
|
PDF optimization
|
|
|
|
================
|
|
|
|
|
|
|
|
OCRmyPDF includes an image-oriented PDF optimizer. By default, the optimizer
|
|
|
|
runs with safe settings with the goal of improving compression at no loss of
|
|
|
|
quality. At higher optimization levels, lossy optimizations may be applied and
|
|
|
|
tuned. Optimization occurs after OCR, and only if OCR succeeded. It does not
|
|
|
|
perform other possible optimizations such as deduplicating resources,
|
|
|
|
consolidating fonts, simplifying vector drawings, or anything of that nature.
|
|
|
|
|
2023-06-11 13:37:53 -07:00
|
|
|
.. list-table:: Title
|
|
|
|
:widths: 33 6 60
|
|
|
|
:header-rows: 1
|
2019-11-08 03:22:28 -08:00
|
|
|
|
2023-06-11 13:37:53 -07:00
|
|
|
* - Optimization level
|
|
|
|
- Shorthand
|
|
|
|
- Description
|
|
|
|
* - ``--optimize 0``
|
|
|
|
- ``-O0``
|
|
|
|
- Disable most optimizations.
|
|
|
|
* - ``--optimize 1`` (default)
|
|
|
|
- ``-O1``
|
|
|
|
- Safe and lossless optimizations.
|
|
|
|
* - ``--optimize 2``
|
|
|
|
- ``-O2``
|
|
|
|
- Safe and lossy optimizations.
|
|
|
|
* - ``--optimize 3``
|
|
|
|
- ``-O3``
|
|
|
|
- Aggressive lossy optimizations.
|
|
|
|
|
|
|
|
The exact type of optimizations performed will vary over time, and depend on
|
|
|
|
the availability of third-party tools.
|
|
|
|
|
|
|
|
Despite optimizations, OCRmyPDF might still increase the overall file size,
|
|
|
|
since it must embed information about the recognized text, and depending on the
|
|
|
|
settings chosen, may not be able to represent the output file as compactly as
|
|
|
|
the input file.
|
2019-12-09 21:39:01 -08:00
|
|
|
|
2019-11-08 03:22:28 -08:00
|
|
|
Optimizations that always occurs
|
|
|
|
================================
|
|
|
|
|
|
|
|
OCRmyPDF will automatically replace obsolete or inferior compression schemes
|
2023-09-27 00:40:05 -07:00
|
|
|
such as RLE or LZW with superior schemes such as Deflate, and convert
|
|
|
|
monochrome images to CCITT G4. Since this is lossless, it always occurs and there
|
2019-11-08 03:22:28 -08:00
|
|
|
is no way to disable it. Other non-image compressed objects are compressed as
|
|
|
|
well.
|
|
|
|
|
|
|
|
Fast web view
|
|
|
|
=============
|
|
|
|
|
|
|
|
OCRmyPDF automatically optimizes PDFs for "fast web view" in Adobe Acrobat's
|
|
|
|
parlance, or equivalently, linearizes PDFs so that the resources they reference
|
|
|
|
are presented in the order a viewer needs them for sequential display. This
|
2023-06-11 13:37:53 -07:00
|
|
|
reduces the latency of viewing a PDF both online and from local storage, in
|
|
|
|
exchange for a slight increase in file size.
|
2019-11-08 03:22:28 -08:00
|
|
|
|
|
|
|
To disable this optimization and all others, use ``ocrmypdf --optimize 0 ...``
|
|
|
|
or the shorthand ``-O0``.
|
|
|
|
|
2023-06-11 13:37:53 -07:00
|
|
|
Adobe Acrobat might not report the file as being "fast web view".
|
|
|
|
|
2019-11-08 03:22:28 -08:00
|
|
|
Lossless optimizations
|
|
|
|
======================
|
|
|
|
|
|
|
|
At optimization level ``-O1`` (the default), OCRmyPDF will also attempt lossless
|
|
|
|
image optimization.
|
|
|
|
|
|
|
|
If a JBIG2 encoder is available, then monochrome images will be converted to
|
|
|
|
JBIG2, with the potential for huge savings on large black and white images,
|
|
|
|
since JBIG2 is far more efficient than any other monochrome (bi-level)
|
|
|
|
compression. (All known US patents related to JBIG2 have probably expired, but
|
|
|
|
it remains the responsibility of the user to supply a JBIG2 encoder such as
|
|
|
|
`jbig2enc <https://github.com/agl/jbig2enc>`__. OCRmyPDF does not implement
|
|
|
|
JBIG2 encoding on its own.)
|
|
|
|
|
|
|
|
OCRmyPDF currently does not attempt to recompress losslessly compressed objects
|
|
|
|
more aggressively.
|
|
|
|
|
|
|
|
Lossy optimizations
|
|
|
|
===================
|
|
|
|
|
|
|
|
At optimization level ``-O2`` and ``-O3``, OCRmyPDF will some attempt lossy
|
|
|
|
image optimization.
|
|
|
|
|
|
|
|
If ``pngquant`` is installed, OCRmyPDF will use it to perform quantize paletted
|
|
|
|
images to reduce their size.
|
|
|
|
|
|
|
|
The quality of JPEGs may be lowered, on the assumption that a lower quality
|
|
|
|
image may be suitable for storage after OCR.
|
|
|
|
|
|
|
|
It is not possible to optimize all image types. Uncommon image types may be
|
|
|
|
skipped by the optimizer.
|
|
|
|
|
|
|
|
OCRmyPDF provides :ref:`lossy mode JBIG2 <jbig2-lossy>` as an advanced feature
|
|
|
|
that additional requires the argument ``--jbig2-lossy``.
|