readme: new sections "features" & "Motivation"

This commit is contained in:
fritz-hh 2013-04-19 23:00:35 +02:00
parent f3e581d162
commit d7c238723b

View File

@ -1,9 +1,32 @@
OCRmyPDF
========
Collection of script aimed at generating searchable PDF files from PDF files containing only images
Script aimed at performing optical character recognition (OCR) on PDF files from PDF files containing only images
ATTENTION: The scripts are still in development phase, please do not use!!!!
Usage: ./OCRmyPDF.sh filename.pdf
ATTENTION: THE SCRIPTS ARE STILL IN DEVELOPMENT PHASE, PLEASE DO NOT USE !!!!
Features
--------
- Generates a searchable PDF/A file from PDF file containing only images
- Keeps the original resolution of the embedded images
- Validation of the generated file against the PDF/A specification using jhove
Motivation
----------
I searched the web for a free tool to OCR PDF files on linux/unix and found many but none of them was satisfying.
- Either they produced PDF files with misplaced text below the image (making copy/paste impossible)
- Or they changed the resolution of the embedded images
- Or they generated PDF file having a rediculous big size
- Or they crashed when trying to OCR some of my PDF files
- Or they did not produce valid PDF files (even though they were readable with my current PDF reader)
On top of that none of them produced PDF/A files (format dedicated for long time storage / archiving)
... so I decided to develop my own tool (using various existing scripts as an inspiration)
Install
--------