OCRmyPDF/docs/optimizer.rst

76 lines
3.1 KiB
ReStructuredText
Raw Normal View History

2019-11-08 03:22:28 -08:00
================
PDF optimization
================
OCRmyPDF includes an image-oriented PDF optimizer. By default, the optimizer
runs with safe settings with the goal of improving compression at no loss of
quality. At higher optimization levels, lossy optimizations may be applied and
tuned. Optimization occurs after OCR, and only if OCR succeeded. It does not
perform other possible optimizations such as deduplicating resources,
consolidating fonts, simplifying vector drawings, or anything of that nature.
Optimization ranges from ``-O0`` through ``-O3``, where ``0`` disables
optimization and ``3`` implements all options. ``1``, the default, performs only
safe and lossless optimizations. (This is similar to GCC's optimization
parameter.) The exact type of optimizations performed will vary over time.
2019-12-09 21:39:01 -08:00
PDF optimization requires third-party, optional tools for certain optimizations.
If these are not installed or cannot be found by OCRmyPDF, optimization will not
be as good.
2019-11-08 03:22:28 -08:00
Optimizations that always occurs
================================
OCRmyPDF will automatically replace obsolete or inferior compression schemes
such as RLE or LZW with superior schemes such as Deflate and converting
monochrome images to CCITT G4. Since this is harmless it always occurs and there
is no way to disable it. Other non-image compressed objects are compressed as
well.
Fast web view
=============
OCRmyPDF automatically optimizes PDFs for "fast web view" in Adobe Acrobat's
parlance, or equivalently, linearizes PDFs so that the resources they reference
are presented in the order a viewer needs them for sequential display. This
reduces the latency of viewing a PDF both online and from local storage. This
actually slightly increases the file size.
To disable this optimization and all others, use ``ocrmypdf --optimize 0 ...``
or the shorthand ``-O0``.
Lossless optimizations
======================
At optimization level ``-O1`` (the default), OCRmyPDF will also attempt lossless
image optimization.
If a JBIG2 encoder is available, then monochrome images will be converted to
JBIG2, with the potential for huge savings on large black and white images,
since JBIG2 is far more efficient than any other monochrome (bi-level)
compression. (All known US patents related to JBIG2 have probably expired, but
it remains the responsibility of the user to supply a JBIG2 encoder such as
`jbig2enc <https://github.com/agl/jbig2enc>`__. OCRmyPDF does not implement
JBIG2 encoding on its own.)
OCRmyPDF currently does not attempt to recompress losslessly compressed objects
more aggressively.
Lossy optimizations
===================
At optimization level ``-O2`` and ``-O3``, OCRmyPDF will some attempt lossy
image optimization.
If ``pngquant`` is installed, OCRmyPDF will use it to perform quantize paletted
images to reduce their size.
The quality of JPEGs may be lowered, on the assumption that a lower quality
image may be suitable for storage after OCR.
It is not possible to optimize all image types. Uncommon image types may be
skipped by the optimizer.
OCRmyPDF provides :ref:`lossy mode JBIG2 <jbig2-lossy>` as an advanced feature
that additional requires the argument ``--jbig2-lossy``.