fritz-hh
9271fe73a8
OCRmyPDF.sh: fixes #27
...
The fix should now be compatible to most implementation of grep
2013-05-02 16:51:46 +02:00
fritz-hh
edaa70b97f
OCRmyPDF.sh: fixes #25 and fixes #26
...
- In debug mode: compute and echo time required for processing
- Resolutions (x/y) that are nearly equal are not supported (because the
test did not take into account imprecision due to trauncation)
2013-05-01 15:58:55 +02:00
fritz-hh
ab07f4deea
OCRmyPDF.sh: handling of path with spaces
...
- corrected fct absolutePath() to handle path with spaces correctly
- pdf title metadata: split on case change file name
- change of owner/group/permission removed from code
- improved logging
2013-05-01 13:44:20 +02:00
fritz-hh
5ce3e9bfec
OCRmyPDF.sh: Version number updated
2013-04-29 12:19:19 +03:00
fritz-hh
2bed210a30
OCRmyPDF.sh: added metadata in final pdf file
...
- added metadata in final pdf file: fixes #4
- improved logging of PDF/A validation results
2013-04-28 22:18:34 +02:00
fritz-hh
2441551156
OCRmyPDF.sh: final pdf same owner & permissions
...
fixes #9
2013-04-28 15:54:31 +02:00
fritz-hh
062ef0ca3a
OCRmyPDF.sh: keep tmp files in debug mode
...
fixes #22
2013-04-28 14:43:21 +02:00
fritz-hh
d3d1c20ca2
Correct version number
...
fixes #19
2013-04-27 14:00:59 +03:00
fritz-hh
6372cec6b8
OCRmyPDF.sh: Fixed major problem with deskew
...
After deskew the images was cropped to the wrong size
2013-04-26 19:37:02 +02:00
fritz-hh
b993c158d0
OCRmyPDF: log msg corrected
2013-04-26 16:52:58 +02:00
fritz-hh
ec26736577
folder structure cleaned
...
- put all src files (except OCRmyPDF.sh) to src
- rename tesseract_cfg to tess-cfg
2013-04-26 16:34:49 +02:00
fritz-hh
a766c5f2b7
typo
2013-04-26 16:19:18 +02:00
fritz-hh
ae716a91cb
jhove paths corrected
2013-04-26 16:11:59 +02:00
fritz-hh
d4195b4362
jhove package added
2013-04-26 14:46:47 +02:00
fritz-hh
1c0eb03b3b
OCRmyPDF.sh: minor improvements
...
- additionnal data logged
- width/height were inverted: corrected
- few other minor changes
2013-04-26 14:20:45 +02:00
fritz-hh
3249fba4a2
OCRmyPDF.sh: log to stderr + check PDF/A profile
...
- fixes #10
- check not only if the final PDF is well formed and valid, but also if
it conforms to the PDF/A profile
2013-04-26 12:23:29 +02:00
fritz-hh
357f449e07
OCRmyPDF.sh: check if python is installed
...
- fixes #14
- minor other changes
2013-04-26 11:50:39 +02:00
fritz-hh
ee738be681
Fixed: issue with deskewing: size sometimes wrong
...
fixes #13
2013-04-25 11:13:30 +02:00
fritz-hh
e21b3155e5
OCRmyPDF.sh: corrected dpi computation
...
fixes #12
2013-04-24 21:12:35 +02:00
fritz-hh
c293ffd621
OCRmyPDF.sh: minor change in code documentation
2013-04-23 22:57:41 +02:00
fritz-hh
2fdaa7595c
OCRmyPDF.sh: better handling of path and tmp folder
...
- user can now define the name/location of the output file
- check if the folder in which in/output files should be located exist
- tmp folder now build using timestamp and input file name
2013-04-23 22:54:58 +02:00
fritz-hh
968a66f66b
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF
2013-04-23 21:43:34 +02:00
fritz-hh
5992afb707
Support for additional tesseract config files
...
This corresponds to the -C option
2013-04-23 21:36:34 +02:00
fritz-hh
9aa83215c4
OCRmyPDF.sh: typo in usage
2013-04-23 00:35:42 +03:00
fritz-hh
4ce249e6ed
OCRmyPDF.sh: new debug option (-g) added
2013-04-22 22:50:34 +02:00
fritz-hh
b9a346ce7d
OCRmyPDF.sh: log levels implemented
...
fixes #5
2013-04-22 20:56:45 +02:00
fritz-hh
64b92ed180
Usage described
...
fixes #6
2013-04-22 20:35:02 +02:00
fritz-hh
c5f2158b85
OCRmyPDF.sh: various changes
...
fixes #3
fixes #2
2013-04-21 21:59:42 +02:00
fritz-hh
d5a3f76234
OCRmyPDF.sh: various improvements
...
- check if x_dpi = y_dpi
- separate options for image deskewing and cleaning
- exit codes defined as constants
2013-04-20 22:03:23 +02:00
fritz-hh
f3e581d162
OCRmyPDF.sh: minor changes
2013-04-19 23:00:00 +02:00
fritz-hh
4f65a31eba
OCRmyPDF.sh: check if utilities are installed
2013-04-19 22:23:28 +02:00
fritz-hh
35d8cffad4
OCRmyPDF.sh: fix error exit not exiting
...
Fixes an error that lead the script not to exit correctly in case more
than 1 image is detected on a page
2013-04-19 21:27:40 +02:00
fritz-hh
0c46a723bd
OCRmyPDF.sh: many improvements!
...
- automatic analysis of jhove validation report
- quiet generation of PDF/A with gs
- deletion of tmp files
- Corrected issue that lead to crash at page 8
- Improved log
2013-04-18 23:13:06 +02:00
fritz-hh
fcac99bc73
OCRmyPDF.sh: code clean-up
2013-04-18 11:16:40 +02:00
fritz-hh
7c3abea232
OCRmyPDF.sh: page number now with leading zeros
2013-04-18 10:43:28 +02:00
fritz-hh
2c23bca913
OCRmyPDF.sh: conversion to PDF/A added
2013-04-18 10:31:36 +02:00
fritz-hh
4188d702ed
OCRmyPDF.sh: computation of resolution
...
Added compuation of resolution of each PDF page
Added extract of image of pgm if colorspace is Gray (to speed up
computation and save space)
2013-04-14 19:15:01 +02:00
fritz-hh
318c77b934
OCRmyPDF.sh: prepare intelligent image extraction
...
preparation of extraction of the image in the same resolution than the
original image inside the pdf file
2013-04-13 12:35:26 +02:00
fritz-hh
b041c0080b
OCRmyPDF.sh: new cmd line I/F of hocrTransform.py
...
Adapted to new new cmd line I/F of hocrTransform.py
2013-04-11 20:29:10 +02:00
fritz-hh
4e4b5ddc58
initial version
2013-04-09 19:00:26 +02:00