Jim Barlow
32ba50b8dc
Add Tesseract timeout to keep things reasonable
2014-11-14 02:06:23 -08:00
Jim Barlow
36aca45f35
The -dci options now work (and valid combinations thereof)
2014-11-14 00:23:22 -08:00
Jim Barlow
925290342d
Leptonica deskew can handle .pnm input, unlike imagemagick
2014-11-13 23:20:25 -08:00
Jim Barlow
4dc0370c57
Add leptonica deskew
2014-11-13 16:53:26 -08:00
Jim Barlow
6021684ab6
Attempt to fix multiprocessing pickling error
2014-11-13 15:58:57 -08:00
Jim Barlow
f4b1d0cdfe
Fix symlink error that occurs in multipage processing
2014-11-13 15:58:36 -08:00
Jim Barlow
d0d8048621
Comments
2014-10-17 17:28:31 -07:00
Jim Barlow
cfd119325d
Use abspath instead of relpath for temporary directory symlink
2014-10-11 17:48:56 -07:00
Jim Barlow
ad30833ffc
Support missing tess_cfg_files parameter when omitted by OCRmyPDF.sh
2014-10-11 17:48:33 -07:00
Jim Barlow
e5c79a6666
Use TIFFs as intermediates
...
pdftoppm in recent versions (0.26.4,5) seems to be incapable of
producing valid TIFFs, so have it dump a .pnm file and let ImageMagick
figure out how to convert it to TIFF. This is not ideal, but at least
it works.
2014-10-10 01:54:16 -07:00
Jim Barlow
63dc753c1b
Standardize intermediate filenames better
...
convert .pnm -deskew <...> .pnm seems to have a bug that produces an
invalid .pnm file which later causes tesseract (specifically,
leptonica) to choke (using 3.02/1.71 as versions, respectively). Will
change pipeline to use tiffs internally since they are less stupid.
2014-10-10 01:30:43 -07:00
Jim Barlow
017bc1f252
Basic error handling
2014-10-10 01:07:46 -07:00
Jim Barlow
bcd67c009d
Sort of working, but fragile; uses tmp folder properly now
2014-10-10 00:35:49 -07:00
Jim Barlow
2f6cfafdfc
Now produces a finished OCR-PDF page
2014-10-08 03:54:06 -07:00
Jim Barlow
25234fa30b
First crack at Ruffus, working well
2014-10-08 03:21:28 -07:00
Jim Barlow
dabbddb04e
deskew and clean
2014-09-27 15:03:07 -07:00
Jim Barlow
fccfb4589e
Moving quickly - we can now output .ppm files at correct resolution
2014-09-26 04:43:15 -07:00
Jim Barlow
5384c98013
Initial ocrpage.py rewrite into python3
2014-09-26 04:19:41 -07:00