10 Commits

Author SHA1 Message Date
Jim Barlow
ad30833ffc Support missing tess_cfg_files parameter when omitted by OCRmyPDF.sh 2014-10-11 17:48:33 -07:00
Jim Barlow
e5c79a6666 Use TIFFs as intermediates
pdftoppm in recent versions (0.26.4,5) seems to be incapable of
producing valid TIFFs, so have it dump a .pnm file and let ImageMagick
figure out how to convert it to TIFF. This is not ideal, but at least
it works.
2014-10-10 01:54:16 -07:00
Jim Barlow
63dc753c1b Standardize intermediate filenames better
convert .pnm -deskew <...> .pnm seems to have a bug that produces an
invalid .pnm file which later causes tesseract (specifically,
leptonica) to choke (using 3.02/1.71 as versions, respectively). Will
change pipeline to use tiffs internally since they are less stupid.
2014-10-10 01:30:43 -07:00
Jim Barlow
017bc1f252 Basic error handling 2014-10-10 01:07:46 -07:00
Jim Barlow
bcd67c009d Sort of working, but fragile; uses tmp folder properly now 2014-10-10 00:35:49 -07:00
Jim Barlow
2f6cfafdfc Now produces a finished OCR-PDF page 2014-10-08 03:54:06 -07:00
Jim Barlow
25234fa30b First crack at Ruffus, working well 2014-10-08 03:21:28 -07:00
Jim Barlow
dabbddb04e deskew and clean 2014-09-27 15:03:07 -07:00
Jim Barlow
fccfb4589e Moving quickly - we can now output .ppm files at correct resolution 2014-09-26 04:43:15 -07:00
Jim Barlow
5384c98013 Initial ocrpage.py rewrite into python3 2014-09-26 04:19:41 -07:00