2895 Commits

Author SHA1 Message Date
Jim Barlow
25234fa30b First crack at Ruffus, working well 2014-10-08 03:21:28 -07:00
fritz-hh
5b17341804 Merge remote-tracking branch 'origin/v2.x' into v3.x 2014-10-07 22:06:05 +02:00
fritz-hh
9bedfa9a72 fixes #95
Exit if the output path points to a folder
Exit if the output path point to an existing file
2014-10-07 16:42:10 +02:00
fritz-hh
e1f1220970 make clear it is a draft from v3.x branch 2014-10-03 16:23:02 +02:00
fritz-hh
5855bcd1fe Merge remote-tracking branch 'origin/v2.x' into v3.x 2014-10-03 16:21:49 +02:00
fritz-hh
a14af5b9ee make clear it is a draft from v2.x branch 2014-10-03 16:18:20 +02:00
fritz-hh
f11c03750e typo 2014-10-03 16:16:17 +02:00
fritz-hh
ea5cfa40c1 Update ROADMAP.md 2014-10-03 16:13:56 +02:00
fritz-hh
c562754d81 typo 2014-10-03 16:11:26 +02:00
fritz-hh
90d892512a roadmap usage updated 2014-10-03 16:09:59 +02:00
fritz-hh
9c6fedb15b usage corrected [-f|-s] 2014-10-03 16:07:06 +02:00
fritz-hh
3a7175115f roadmap arguments specified 2014-10-03 16:03:02 +02:00
fritz-hh
98c41f3223 typo in usage 2014-10-03 15:44:14 +02:00
fritz-hh
d101e96e16 roadmap: better layout 2014-10-03 15:30:29 +02:00
fritz-hh
a446b6c440 roadmap rename steps 2014-10-03 15:19:20 +02:00
fritz-hh
b1fec0f1b1 roadmap detailed 2014-10-03 15:17:53 +02:00
fritz-hh
1dfdc93745 draft roadmap for v3.x 2014-10-03 15:02:17 +02:00
fritz-hh
6c5ee4095c default language now set in the config.sh file 2014-09-30 23:28:22 +02:00
fritz-hh
986fbf63a4 Introduce -s option + fix bug when -C no set
- Introduce -s option to no ocr pages containing fonts
- Solve issue with -f and -s if -C is not set
2014-09-30 23:16:31 +02:00
fritz-hh
2612105d32 correct download path 2014-09-29 22:29:25 +02:00
fritz-hh
954fe13f54 update release notes for v2.2-stable v2.2-stable 2014-09-29 22:25:02 +02:00
fritz-hh
bb5a00685e Make clear this is a draft 2014-09-28 21:10:04 +02:00
Jim Barlow
dabbddb04e deskew and clean 2014-09-27 15:03:07 -07:00
fritz-hh
5f173e5acb return right return code
Python does not map the expression to its return code automatically, so
this line returns success regardless of the reportlab version installed.
(I also realized that hasattr is superfluous).
2014-09-27 00:53:10 +02:00
fritz-hh
b28ff40aea remove reportlab patch. fixes #91
remove patch that was required for versions of reportlab <3.0 (fixed in
3.0 now)
patch was necessary in order to reduce size of graysclage / b&w images
in pdf
2014-09-26 23:58:19 +02:00
Jim Barlow
fccfb4589e Moving quickly - we can now output .ppm files at correct resolution 2014-09-26 04:43:15 -07:00
Jim Barlow
5384c98013 Initial ocrpage.py rewrite into python3 2014-09-26 04:19:41 -07:00
fritz-hh
2ed2307573 Merge pull request #89 from jbarlow83/feature/readlink-osx
More portable solution (works also on OS X) to get OCRmyPDF.sh path (following simlinks)
2014-09-25 23:09:26 +02:00
Jim Barlow
3f8a2d8d3e Eliminate readlink entirely and do the same thing on all platforms 2014-09-25 13:47:35 -07:00
fritz-hh
1a13b7c85f Check if the input file exist
Previously I checked only if the folder in which the input file should
be exists
2014-09-25 22:03:45 +02:00
Jim Barlow
d7130a1e56 Merge branch 'feature/keep-text-pages' into develop 2014-09-25 03:50:21 -07:00
Jim Barlow
f69054cb17 Fix parameter order problems
Put TESS_CFG_FILES last because it is optional and can be blank. If
omitted it breaks the sequence of subsequent parameters. Also cleanup
text output in this new mode.
2014-09-25 03:50:01 -07:00
Jim Barlow
80dc6eca2c Merge branches 'feature/readlink-osx' and 'feature/keep-text-pages' into develop
Conflicts:
	OCRmyPDF.sh
2014-09-25 03:14:10 -07:00
Jim Barlow
d250fbb3d6 Fix call to readlink on OS X
readlink -f is a GNU coreutils extension, so not available on OS X and
other platforms.
2014-09-25 03:11:27 -07:00
Jim Barlow
09bbe92611 Add command line option to skip pages that contain font data
If a page contains font data, the script would abort, unless -f was given,
in which case it would use pdftoppm to rasterize the font into a bitmap
and then attempt to OCR it. -f is almost certainly not what users want
unless they want to debug OCR or something.

If a PDF already has fonts it either was OCR'd already, or it is
a composite file containing, for example, some scanned documents appended
to a text report.  In the latter case, this -s option provides OCR on
pages that don't have it without changing those that do, and if a PDF
was completely OCRed it will be converted to PDF/A.  In batch jobs with
a mix of OCR and non-OCR the implicit conversion to PDF/A is also useful.
2014-09-25 02:43:40 -07:00
Jim Barlow
69d922e096 Check for missing pdftoppm when poppler installed with --disable-splash-output
When I upgraded to poppler 0.24.5, pdftoppm was not compiled because the
script had --disable-splash-output set for some reason.

For OS X Homebrew the solution is:
brew uninstall poppler
brew install poppler --with-splash-output
2014-09-25 02:30:29 -07:00
fritz-hh
d510e7e4ae prevent new spurious jhove message to be displayed 2014-09-24 23:43:37 +02:00
fritz-hh
5893290dd9 update to jhove v1.11 2014-09-24 23:17:39 +02:00
fritz-hh
5c3bbc4031 typo in OCRmyPDF.sh 2014-09-22 21:22:38 +02:00
fritz-hh
27cd8cf0db add link to heise open source 2014-09-20 20:47:02 +02:00
fritz-hh
b403016d5b Release notes updated for v2.1-stable v2.1-stable 2014-09-20 19:50:32 +02:00
fritz-hh
5a81823969 Merge pull request #82 from orbitcowboy/v2.x
Fixed typo
2014-09-20 19:02:33 +02:00
fritz-hh
17801401cd Merge pull request #83 from DorianScholz/v2.x
- small changes to make this work on Ubuntu 12.04 called via symlink
- lowered minimum parallel version
2014-09-20 18:59:57 +02:00
Dorian Scholz
5c7b2a2a36 lowered minimum version for parallel to 20121122 2014-09-10 13:27:59 +02:00
Dorian Scholz
1db06de287 added BASEPATH to allow for execution via symlink 2014-09-10 13:26:14 +02:00
Martin Ettl
3904178d44 Fixed typo 2014-09-09 07:01:04 +02:00
fritz-hh
8bb9c3610c Merge pull request #81 from MoritzFago/v2.x
fixed tipo ghostcript to ghostscript
2014-09-08 18:31:00 +02:00
MoritzFago
7dcc382ccc fixed tipo ghostcript to ghostscript 2014-09-08 16:52:49 +02:00
fritz-hh
b71fc807d2 Merge pull request #77 from andysigner/v2.x
Fixed typo in help text
2014-05-23 19:51:20 +02:00
Andy Signer
15d28d970a Fixed typo in help text 2014-05-23 12:41:31 +02:00