21 Commits

Author SHA1 Message Date
Jim Barlow
dc2a4ab044 Logic error 2015-02-08 17:33:35 -08:00
Jim Barlow
b16d6f5b81 Implement skipping OCR when -s is specified
Appears to be necessary to disable each state of the pipeline that is
inactive, not just initial and terminal stages of an inactive segment.
If nothing else this makes what is going on more explicit.
2015-02-08 17:26:16 -08:00
Jim Barlow
69ce6ff7b5 Not a named param 2014-11-22 15:35:05 -08:00
Jim Barlow
32ba50b8dc Add Tesseract timeout to keep things reasonable 2014-11-14 02:06:23 -08:00
Jim Barlow
36aca45f35 The -dci options now work (and valid combinations thereof) 2014-11-14 00:23:22 -08:00
Jim Barlow
925290342d Leptonica deskew can handle .pnm input, unlike imagemagick 2014-11-13 23:20:25 -08:00
Jim Barlow
4dc0370c57 Add leptonica deskew 2014-11-13 16:53:26 -08:00
Jim Barlow
6021684ab6 Attempt to fix multiprocessing pickling error 2014-11-13 15:58:57 -08:00
Jim Barlow
f4b1d0cdfe Fix symlink error that occurs in multipage processing 2014-11-13 15:58:36 -08:00
Jim Barlow
d0d8048621 Comments 2014-10-17 17:28:31 -07:00
Jim Barlow
cfd119325d Use abspath instead of relpath for temporary directory symlink 2014-10-11 17:48:56 -07:00
Jim Barlow
ad30833ffc Support missing tess_cfg_files parameter when omitted by OCRmyPDF.sh 2014-10-11 17:48:33 -07:00
Jim Barlow
e5c79a6666 Use TIFFs as intermediates
pdftoppm in recent versions (0.26.4,5) seems to be incapable of
producing valid TIFFs, so have it dump a .pnm file and let ImageMagick
figure out how to convert it to TIFF. This is not ideal, but at least
it works.
2014-10-10 01:54:16 -07:00
Jim Barlow
63dc753c1b Standardize intermediate filenames better
convert .pnm -deskew <...> .pnm seems to have a bug that produces an
invalid .pnm file which later causes tesseract (specifically,
leptonica) to choke (using 3.02/1.71 as versions, respectively). Will
change pipeline to use tiffs internally since they are less stupid.
2014-10-10 01:30:43 -07:00
Jim Barlow
017bc1f252 Basic error handling 2014-10-10 01:07:46 -07:00
Jim Barlow
bcd67c009d Sort of working, but fragile; uses tmp folder properly now 2014-10-10 00:35:49 -07:00
Jim Barlow
2f6cfafdfc Now produces a finished OCR-PDF page 2014-10-08 03:54:06 -07:00
Jim Barlow
25234fa30b First crack at Ruffus, working well 2014-10-08 03:21:28 -07:00
Jim Barlow
dabbddb04e deskew and clean 2014-09-27 15:03:07 -07:00
Jim Barlow
fccfb4589e Moving quickly - we can now output .ppm files at correct resolution 2014-09-26 04:43:15 -07:00
Jim Barlow
5384c98013 Initial ocrpage.py rewrite into python3 2014-09-26 04:19:41 -07:00