Jim Barlow
25234fa30b
First crack at Ruffus, working well
2014-10-08 03:21:28 -07:00
fritz-hh
5b17341804
Merge remote-tracking branch 'origin/v2.x' into v3.x
2014-10-07 22:06:05 +02:00
fritz-hh
9bedfa9a72
fixes #95
...
Exit if the output path points to a folder
Exit if the output path point to an existing file
2014-10-07 16:42:10 +02:00
fritz-hh
e1f1220970
make clear it is a draft from v3.x branch
2014-10-03 16:23:02 +02:00
fritz-hh
5855bcd1fe
Merge remote-tracking branch 'origin/v2.x' into v3.x
2014-10-03 16:21:49 +02:00
fritz-hh
a14af5b9ee
make clear it is a draft from v2.x branch
2014-10-03 16:18:20 +02:00
fritz-hh
f11c03750e
typo
2014-10-03 16:16:17 +02:00
fritz-hh
ea5cfa40c1
Update ROADMAP.md
2014-10-03 16:13:56 +02:00
fritz-hh
c562754d81
typo
2014-10-03 16:11:26 +02:00
fritz-hh
90d892512a
roadmap usage updated
2014-10-03 16:09:59 +02:00
fritz-hh
9c6fedb15b
usage corrected [-f|-s]
2014-10-03 16:07:06 +02:00
fritz-hh
3a7175115f
roadmap arguments specified
2014-10-03 16:03:02 +02:00
fritz-hh
98c41f3223
typo in usage
2014-10-03 15:44:14 +02:00
fritz-hh
d101e96e16
roadmap: better layout
2014-10-03 15:30:29 +02:00
fritz-hh
a446b6c440
roadmap rename steps
2014-10-03 15:19:20 +02:00
fritz-hh
b1fec0f1b1
roadmap detailed
2014-10-03 15:17:53 +02:00
fritz-hh
1dfdc93745
draft roadmap for v3.x
2014-10-03 15:02:17 +02:00
fritz-hh
6c5ee4095c
default language now set in the config.sh file
2014-09-30 23:28:22 +02:00
fritz-hh
986fbf63a4
Introduce -s option + fix bug when -C no set
...
- Introduce -s option to no ocr pages containing fonts
- Solve issue with -f and -s if -C is not set
2014-09-30 23:16:31 +02:00
fritz-hh
2612105d32
correct download path
2014-09-29 22:29:25 +02:00
fritz-hh
954fe13f54
update release notes for v2.2-stable
v2.2-stable
2014-09-29 22:25:02 +02:00
fritz-hh
bb5a00685e
Make clear this is a draft
2014-09-28 21:10:04 +02:00
Jim Barlow
dabbddb04e
deskew and clean
2014-09-27 15:03:07 -07:00
fritz-hh
5f173e5acb
return right return code
...
Python does not map the expression to its return code automatically, so
this line returns success regardless of the reportlab version installed.
(I also realized that hasattr is superfluous).
2014-09-27 00:53:10 +02:00
fritz-hh
b28ff40aea
remove reportlab patch. fixes #91
...
remove patch that was required for versions of reportlab <3.0 (fixed in
3.0 now)
patch was necessary in order to reduce size of graysclage / b&w images
in pdf
2014-09-26 23:58:19 +02:00
Jim Barlow
fccfb4589e
Moving quickly - we can now output .ppm files at correct resolution
2014-09-26 04:43:15 -07:00
Jim Barlow
5384c98013
Initial ocrpage.py rewrite into python3
2014-09-26 04:19:41 -07:00
fritz-hh
2ed2307573
Merge pull request #89 from jbarlow83/feature/readlink-osx
...
More portable solution (works also on OS X) to get OCRmyPDF.sh path (following simlinks)
2014-09-25 23:09:26 +02:00
Jim Barlow
3f8a2d8d3e
Eliminate readlink entirely and do the same thing on all platforms
2014-09-25 13:47:35 -07:00
fritz-hh
1a13b7c85f
Check if the input file exist
...
Previously I checked only if the folder in which the input file should
be exists
2014-09-25 22:03:45 +02:00
Jim Barlow
d7130a1e56
Merge branch 'feature/keep-text-pages' into develop
2014-09-25 03:50:21 -07:00
Jim Barlow
f69054cb17
Fix parameter order problems
...
Put TESS_CFG_FILES last because it is optional and can be blank. If
omitted it breaks the sequence of subsequent parameters. Also cleanup
text output in this new mode.
2014-09-25 03:50:01 -07:00
Jim Barlow
80dc6eca2c
Merge branches 'feature/readlink-osx' and 'feature/keep-text-pages' into develop
...
Conflicts:
OCRmyPDF.sh
2014-09-25 03:14:10 -07:00
Jim Barlow
d250fbb3d6
Fix call to readlink on OS X
...
readlink -f is a GNU coreutils extension, so not available on OS X and
other platforms.
2014-09-25 03:11:27 -07:00
Jim Barlow
09bbe92611
Add command line option to skip pages that contain font data
...
If a page contains font data, the script would abort, unless -f was given,
in which case it would use pdftoppm to rasterize the font into a bitmap
and then attempt to OCR it. -f is almost certainly not what users want
unless they want to debug OCR or something.
If a PDF already has fonts it either was OCR'd already, or it is
a composite file containing, for example, some scanned documents appended
to a text report. In the latter case, this -s option provides OCR on
pages that don't have it without changing those that do, and if a PDF
was completely OCRed it will be converted to PDF/A. In batch jobs with
a mix of OCR and non-OCR the implicit conversion to PDF/A is also useful.
2014-09-25 02:43:40 -07:00
Jim Barlow
69d922e096
Check for missing pdftoppm when poppler installed with --disable-splash-output
...
When I upgraded to poppler 0.24.5, pdftoppm was not compiled because the
script had --disable-splash-output set for some reason.
For OS X Homebrew the solution is:
brew uninstall poppler
brew install poppler --with-splash-output
2014-09-25 02:30:29 -07:00
fritz-hh
d510e7e4ae
prevent new spurious jhove message to be displayed
2014-09-24 23:43:37 +02:00
fritz-hh
5893290dd9
update to jhove v1.11
2014-09-24 23:17:39 +02:00
fritz-hh
5c3bbc4031
typo in OCRmyPDF.sh
2014-09-22 21:22:38 +02:00
fritz-hh
27cd8cf0db
add link to heise open source
2014-09-20 20:47:02 +02:00
fritz-hh
b403016d5b
Release notes updated for v2.1-stable
v2.1-stable
2014-09-20 19:50:32 +02:00
fritz-hh
5a81823969
Merge pull request #82 from orbitcowboy/v2.x
...
Fixed typo
2014-09-20 19:02:33 +02:00
fritz-hh
17801401cd
Merge pull request #83 from DorianScholz/v2.x
...
- small changes to make this work on Ubuntu 12.04 called via symlink
- lowered minimum parallel version
2014-09-20 18:59:57 +02:00
Dorian Scholz
5c7b2a2a36
lowered minimum version for parallel to 20121122
2014-09-10 13:27:59 +02:00
Dorian Scholz
1db06de287
added BASEPATH to allow for execution via symlink
2014-09-10 13:26:14 +02:00
Martin Ettl
3904178d44
Fixed typo
2014-09-09 07:01:04 +02:00
fritz-hh
8bb9c3610c
Merge pull request #81 from MoritzFago/v2.x
...
fixed tipo ghostcript to ghostscript
2014-09-08 18:31:00 +02:00
MoritzFago
7dcc382ccc
fixed tipo ghostcript to ghostscript
2014-09-08 16:52:49 +02:00
fritz-hh
b71fc807d2
Merge pull request #77 from andysigner/v2.x
...
Fixed typo in help text
2014-05-23 19:51:20 +02:00
Andy Signer
15d28d970a
Fixed typo in help text
2014-05-23 12:41:31 +02:00