2895 Commits

Author SHA1 Message Date
fritz-hh
83560cbd1d hocrTransform: font changed to Helvetica
- Font changed to Helvetica (instead of courrier)
- License text deleted (license file already available)
2013-04-26 11:49:21 +02:00
fritz-hh
1860f80cae Update COPYRIGHT.md 2013-04-26 12:19:28 +03:00
fritz-hh
4ea97c4fe4 Update README.md 2013-04-25 12:20:26 +03:00
fritz-hh
ee738be681 Fixed: issue with deskewing: size sometimes wrong
fixes #13
2013-04-25 11:13:30 +02:00
fritz-hh
e21b3155e5 OCRmyPDF.sh: corrected dpi computation
fixes #12
2013-04-24 21:12:35 +02:00
fritz-hh
c293ffd621 OCRmyPDF.sh: minor change in code documentation 2013-04-23 22:57:41 +02:00
fritz-hh
2fdaa7595c OCRmyPDF.sh: better handling of path and tmp folder
- user can now define the name/location of the output file
- check if the folder in which in/output files should be located exist
- tmp folder now build using timestamp and input file name
2013-04-23 22:54:58 +02:00
fritz-hh
968a66f66b Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF 2013-04-23 21:43:34 +02:00
fritz-hh
5992afb707 Support for additional tesseract config files
This corresponds to the -C option
2013-04-23 21:36:34 +02:00
fritz-hh
9aa83215c4 OCRmyPDF.sh: typo in usage 2013-04-23 00:35:42 +03:00
fritz-hh
939a148812 Update README.md 2013-04-23 00:33:48 +03:00
fritz-hh
4ce249e6ed OCRmyPDF.sh: new debug option (-g) added 2013-04-22 22:50:34 +02:00
fritz-hh
422aaa80f3 hocrTransform.py: various changes
-a option remove
bounding boxes for paragraphs added
color and style of bounding boxes improved
2013-04-22 22:48:41 +02:00
fritz-hh
b9a346ce7d OCRmyPDF.sh: log levels implemented
fixes #5
2013-04-22 20:56:45 +02:00
fritz-hh
64b92ed180 Usage described
fixes #6
2013-04-22 20:35:02 +02:00
fritz-hh
90fc5c9de4 Update README.md 2013-04-21 23:13:04 +03:00
fritz-hh
7118c2f04b Update README.md 2013-04-21 23:09:32 +03:00
fritz-hh
d66712ab42 Update README.md 2013-04-21 23:09:00 +03:00
fritz-hh
c5f2158b85 OCRmyPDF.sh: various changes
fixes #3
fixes #2
2013-04-21 21:59:42 +02:00
fritz-hh
a5c5353fbd Create COPYRIGHT.md 2013-04-20 23:22:57 +03:00
fritz-hh
d5a3f76234 OCRmyPDF.sh: various improvements
- check if x_dpi = y_dpi
- separate options for image deskewing and cleaning
- exit codes defined as constants
2013-04-20 22:03:23 +02:00
fritz-hh
7c18203845 Update README.md 2013-04-20 00:15:14 +03:00
fritz-hh
d7c238723b readme: new sections "features" & "Motivation" 2013-04-19 23:00:35 +02:00
fritz-hh
f3e581d162 OCRmyPDF.sh: minor changes 2013-04-19 23:00:00 +02:00
fritz-hh
4f65a31eba OCRmyPDF.sh: check if utilities are installed 2013-04-19 22:23:28 +02:00
fritz-hh
35d8cffad4 OCRmyPDF.sh: fix error exit not exiting
Fixes an error that lead the script not to exit correctly in case more
than 1 image is detected on a page
2013-04-19 21:27:40 +02:00
fritz-hh
0c46a723bd OCRmyPDF.sh: many improvements!
- automatic analysis of jhove validation report
- quiet generation of PDF/A with gs
- deletion of tmp files
- Corrected issue that lead to crash at page 8
- Improved log
2013-04-18 23:13:06 +02:00
fritz-hh
fcac99bc73 OCRmyPDF.sh: code clean-up 2013-04-18 11:16:40 +02:00
fritz-hh
42208aa5fe Readme: Installation section started 2013-04-18 10:44:10 +02:00
fritz-hh
7c3abea232 OCRmyPDF.sh: page number now with leading zeros 2013-04-18 10:43:28 +02:00
fritz-hh
2c23bca913 OCRmyPDF.sh: conversion to PDF/A added 2013-04-18 10:31:36 +02:00
fritz-hh
4188d702ed OCRmyPDF.sh: computation of resolution
Added compuation of resolution of each PDF page
Added extract of image of pgm if colorspace is Gray (to speed up
computation and save space)
2013-04-14 19:15:01 +02:00
fritz-hh
318c77b934 OCRmyPDF.sh: prepare intelligent image extraction
preparation of extraction of the image in the same resolution than the
original image inside the pdf file
2013-04-13 12:35:26 +02:00
fritz-hh
b041c0080b OCRmyPDF.sh: new cmd line I/F of hocrTransform.py
Adapted to new new cmd line I/F of hocrTransform.py
2013-04-11 20:29:10 +02:00
fritz-hh
ed93878851 hocrTransform.py: cmd line interface improved
Command line interface improved in order to allow:
- show bounding boxes border
- set OCR resolution
- show text above image
2013-04-11 20:24:37 +02:00
fritz-hh
df56c134e4 hocrTranform.py: moved size computation to init 2013-04-10 16:33:03 +03:00
fritz-hh
c51babfd27 hocrTranform.py: A4 page size corrected 2013-04-10 16:22:15 +03:00
fritz-hh
8fdbfc3c95 hocrTranform.py: license added 2013-04-10 16:19:46 +03:00
fritz-hh
4d378c3b14 hocrTransform: code cleanup 2013-04-09 21:51:39 +02:00
fritz-hh
81d5b7b5e5 readme: warning that still in development 2013-04-09 21:35:47 +02:00
fritz-hh
accc082b91 hocrTransform: code cleanup 2013-04-09 21:35:22 +02:00
fritz-hh
4e4b5ddc58 initial version 2013-04-09 19:00:26 +02:00
fritz-hh
4202826dfa gitignore, gitattributes and releaseNotes added 2013-04-09 18:54:14 +02:00
fritz-hh
b011ddd2d9 Update README.md 2013-04-09 19:53:17 +03:00
fritz-hh
7972a156fc Initial commit 2013-04-09 09:44:46 -07:00