diff --git a/docs/installation.rst b/docs/installation.rst index 08e27ea3..4c51a4f1 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -377,6 +377,9 @@ Assuming you have a Docker engine running, you can download one of the three ava * - ocrmypdf-polyglot - ``docker pull jbarlow83/ocrmypdf-polyglot`` - As above, with all available language packs. + * - ocrmypdf-webservice + - ``docker pull jbarlow83/ocrmypdf-polyglot`` + - All language packs, and a simple HTTP wrapper allowing OCRmyPDF to be used as a web service. Note that this component is licensed under AGPLv3. For example: diff --git a/docs/introduction.rst b/docs/introduction.rst index c4b4f52d..bae5aa73 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -90,6 +90,7 @@ Ghostscript also imposes some limitations: * PDFs containing JBIG2-encoded content will be converted to CCITT Group4 encoding, which has lower compression ratios, if Ghostscript PDF/A is enabled. * PDFs containing JPEG 2000-encoded content will be converted to JPEG encoding, which may introduce compression artifacts, if Ghostscript PDF/A is enabled. * Ghostscript may transcode grayscale and color images, either lossy to lossless or lossless to lossy, based on an internal algorithm. This behavior can be suppressed by setting ``--pdfa-image-compression`` to ``jpeg`` or ``lossless`` to set all images to one type or the other. Ghostscript has no option to maintain the input image's format. (Ghostscript 9.25+ can copy JPEG images without transcoding them; earlier versions will transcode.) +* Ghostscript's PDF/A conversion removes any XMP metadata that is not one of the standard XMP metadata namespaces for PDFs. In particular, PRISM Metdata is removed. Regarding OCRmyPDF itself: @@ -109,7 +110,10 @@ To the author's knowledge, OCRmyPDF is the most feature-rich and thoroughly test Web front-ends -------------- -* `Nextcloud OCR `_ is a free software plugin for the Nextcloud private cloud software -* `OCRmyPDF-web `_, a micro web-frontend for OCRmyPDF (third-party, not actively maintained) +The Docker image ocrmypdf-webservice provides a web service front-end that allows files to submitted over HTTP and the results "downloaded". This is an HTTP server intended to simplify web services deployments; it is not intended to be deployed on the public internet and no real security measures to speak of. -Bear in mind that OCRmyPDF is not designed to be secure against malware-bearing PDFs (see `Using OCRmyPDF online`_). +In addition, the following integrations are available: + +* `Nextcloud OCR `_ is a free software plugin for the Nextcloud private cloud software + +Bear in mind that OCRmyPDF is not designed to be secure against malware-bearing PDFs (see `Using OCRmyPDF online`_). Users should ensure they comply with OCRmyPDF's licenses and the licenses of all dependencies. In particular, OCRmyPDF requires Ghostscript, which is licensed under AGPLv3.