18 Commits

Author SHA1 Message Date
Jim Barlow
32ba50b8dc Add Tesseract timeout to keep things reasonable 2014-11-14 02:06:23 -08:00
Jim Barlow
36aca45f35 The -dci options now work (and valid combinations thereof) 2014-11-14 00:23:22 -08:00
Jim Barlow
925290342d Leptonica deskew can handle .pnm input, unlike imagemagick 2014-11-13 23:20:25 -08:00
Jim Barlow
4dc0370c57 Add leptonica deskew 2014-11-13 16:53:26 -08:00
Jim Barlow
6021684ab6 Attempt to fix multiprocessing pickling error 2014-11-13 15:58:57 -08:00
Jim Barlow
f4b1d0cdfe Fix symlink error that occurs in multipage processing 2014-11-13 15:58:36 -08:00
Jim Barlow
d0d8048621 Comments 2014-10-17 17:28:31 -07:00
Jim Barlow
cfd119325d Use abspath instead of relpath for temporary directory symlink 2014-10-11 17:48:56 -07:00
Jim Barlow
ad30833ffc Support missing tess_cfg_files parameter when omitted by OCRmyPDF.sh 2014-10-11 17:48:33 -07:00
Jim Barlow
e5c79a6666 Use TIFFs as intermediates
pdftoppm in recent versions (0.26.4,5) seems to be incapable of
producing valid TIFFs, so have it dump a .pnm file and let ImageMagick
figure out how to convert it to TIFF. This is not ideal, but at least
it works.
2014-10-10 01:54:16 -07:00
Jim Barlow
63dc753c1b Standardize intermediate filenames better
convert .pnm -deskew <...> .pnm seems to have a bug that produces an
invalid .pnm file which later causes tesseract (specifically,
leptonica) to choke (using 3.02/1.71 as versions, respectively). Will
change pipeline to use tiffs internally since they are less stupid.
2014-10-10 01:30:43 -07:00
Jim Barlow
017bc1f252 Basic error handling 2014-10-10 01:07:46 -07:00
Jim Barlow
bcd67c009d Sort of working, but fragile; uses tmp folder properly now 2014-10-10 00:35:49 -07:00
Jim Barlow
2f6cfafdfc Now produces a finished OCR-PDF page 2014-10-08 03:54:06 -07:00
Jim Barlow
25234fa30b First crack at Ruffus, working well 2014-10-08 03:21:28 -07:00
Jim Barlow
dabbddb04e deskew and clean 2014-09-27 15:03:07 -07:00
Jim Barlow
fccfb4589e Moving quickly - we can now output .ppm files at correct resolution 2014-09-26 04:43:15 -07:00
Jim Barlow
5384c98013 Initial ocrpage.py rewrite into python3 2014-09-26 04:19:41 -07:00