2676 Commits

Author SHA1 Message Date
fritz-hh
38c64ac689 dependency to pdftk removed
concatenation is now done also with ghostscript
2014-01-15 21:23:42 +01:00
fritz-hh
6d203e3eee portability improvements + minor changes 2014-01-15 21:23:41 +01:00
fritz-hh
81f461e557 disclaimer added 2014-01-14 23:46:33 +01:00
fritz-hh
988bde1387 tmpfiles to $TMPDIR + better portability (mktemp)
mktemp: consider both FreeBSD/OSX and Linux OS having incompatible
syntax
From now on temporary files are saved in the folder specified by the
environment variable $TMPDIR
2014-01-14 22:57:10 +01:00
fritz-hh
aedbabdbe8 merged pull request from oxplot 2014-01-14 22:29:41 +01:00
fritz-hh
6ed53e53c7 Readme improved 2014-01-14 19:47:28 +01:00
Mansour Behabadi
a78630ce99 Make src scripts executable
Signed-off-by: Mansour Behabadi <mansour@oxplot.com>
2014-01-14 17:50:46 +11:00
Mansour Behabadi
6653066784 Use --gnu in parralell and XX for mktemp
Signed-off-by: Mansour Behabadi <mansour@oxplot.com>
2014-01-14 17:49:24 +11:00
fritz-hh
e40f1fa081 better handling of ligatures: fixes #58 2014-01-13 23:13:15 +01:00
fritz-hh
a872ce751d config file restructured
to be make which parameters are allowed to be changed by the user
2014-01-13 22:11:28 +01:00
fritz-hh
317846fbdc Check of tmp folder creation was successful 2014-01-13 22:05:26 +01:00
fritz-hh
f581a55544 Merge pull request #57 from jbarlow83/for-upstream/tmpfolder
Fix temporary folder name generation collisions
2014-01-13 12:31:02 -08:00
fritz-hh
447b291e70 minor changes 2014-01-13 18:03:44 +01:00
fritz-hh
01d07253e8 indicate python2 to be used in header 2014-01-13 18:03:43 +01:00
fritz-hh
034a466094 Merge pull request #56 from jbarlow83/for-upstream/hocr-selfwidth
Fix AttributeError on self.width if Tesseract finds no OCR text
2014-01-13 08:44:16 -08:00
fritz-hh
c6211e2335 Merge pull request #55 from jbarlow83/for-upstream/check-poppler
Verify that pdftoppm is the Poppler version, not xpdf version
2014-01-13 08:42:33 -08:00
Jim Barlow
1d03a6417d Verify that pdftoppm is the Poppler version, not xpdf version 2014-01-12 22:12:09 -08:00
Jim Barlow
1d62ef27a2 Fix AttributeError on self.width if Tesseract finds no OCR text
self.width remains undefined unless hOCR finds text.  It might not, if
a page contains only an image for example.

Full error message is:
AttributeError: ‘hocrTransform’ object has no attribute ‘width’
2014-01-12 22:10:15 -08:00
Jim Barlow
996048dc08 Fix temporary folder name generation collisions
First, the regular expression matches everything after the first period
in a filename.  Adding the $ make it match the last, so that filenames
such as “Report.1.pdf” get trimmed to “Report.1”.

Next use mktemp to get the OS to create a temporary folder.  It will
guarantee a unique directory name beginning with prefix, even if parallel
processes are at work.
2014-01-12 22:05:11 -08:00
fritz-hh
bf02ee3bdc Resolved conflits with jbarlow83 pull request 2014-01-12 15:37:14 +01:00
fritz-hh
a3c7fba02d minor changes (comments) 2014-01-11 22:26:29 +01:00
fritz-hh
a8cd7febf6 remove spurious space in img number
Tell the script that "nbImg" is a number, so that leading/trailing
spaces are removed
2014-01-11 22:15:53 +01:00
fritz-hh
20c008b84f avoid spurious error msg if no image in pdf 2014-01-11 22:05:19 +01:00
fritz-hh
7cd73566be check if python libs are installed
Check if reportlab and lxml are installed, otherwise exist with an error
2014-01-11 17:08:26 +01:00
fritz-hh
e56fd53d06 poppler syntax (rather than xpdf syntax) 2014-01-11 16:19:52 +01:00
fritz-hh
810b1b3b3e Merge pull request #48 from jbarlow83/for-upstream/osx-errors
Fix pdffonts error when filename contains a space
2014-01-11 07:10:12 -08:00
fritz-hh
cb0b033fe7 Merge branch 'v2.x' of https://github.com/fritz-hh/OCRmyPDF into v2.x 2014-01-11 15:52:01 +01:00
fritz-hh
46f673a3b7 exit if bad parallel/tesseract version installed 2014-01-10 22:59:33 +01:00
fritz-hh
455303b3d4 parallel version added in RELEASE_NOTES 2014-01-10 22:12:58 +01:00
Jim Barlow
24a84d6380 Fix pdffonts error when filename contains a space 2014-01-09 16:44:24 -08:00
Jim Barlow
9aa2171052 Monkeypatch reportlab to output grayscale and monochrome colorspaces 2014-01-09 16:36:26 -08:00
Jim Barlow
3a46ea1f36 Merge branch 'for-upstream/pdftoppm-error' into for-upstream/mono 2014-01-09 16:20:05 -08:00
Jim Barlow
d33779f301 Detect monochrome images and extract them as PBM (1 bpp) 2014-01-09 16:15:24 -08:00
Jim Barlow
d6ea0793b8 Fix ocrPage.sh pdftoppm error on OS X 10.9 2014-01-09 16:04:37 -08:00
fritz-hh
4e5e5bb925 version changed to v2.x 2014-01-08 20:57:55 +01:00
fritz-hh
3232ed8e38 link to releases updated 2014-01-08 20:56:34 +01:00
fritz-hh
29d6748af8 release_notes and readme updated for v2.0-rc1 v2.0-rc1 2014-01-07 23:13:42 +01:00
fritz-hh
828f195071 erroneous exit code corrected 2014-01-07 21:57:18 +01:00
fritz-hh
b0b7e32783 fixes #40 and code cleanup 2014-01-07 21:51:15 +01:00
fritz-hh
c1103c0248 check tesseract version
fixes #41
versions older than 3.02.02 are known to produce invalid hocr output (in
some cases)
2014-01-07 21:04:28 +01:00
fritz-hh
940a016e95 link to issue tracking system added 2014-01-06 23:12:15 +01:00
fritz-hh
c6cc098e47 create symbolic links and not copy
If deskew and/or cleanup is not requested, do not copy the files, but
just create symbolic link.
This saves disk place and makes the script slightly quicker
2014-01-06 23:08:35 +01:00
fritz-hh
54f47ab89b Minor change 2014-01-06 22:41:43 +01:00
fritz-hh
fc3de64dce Changed debug page name
In order to have the debug page after the normal panel in the final PDF
file
2014-01-06 22:41:29 +01:00
fritz-hh
414c4e3f3c round dpi value correctly 2014-01-06 22:32:11 +01:00
fritz-hh
6a9f38d31e removed unused variables 2014-01-06 22:23:41 +01:00
fritz-hh
aa4256d35c fixes #44
The x/y resolutions are not computed separately anymore.
We do not check anymore if x and y resolutions are different (not
measure could anyway be taken if they were not equal...)
2014-01-06 22:23:00 +01:00
fritz-hh
8a1241ba44 minor changes (indentation and fct name) 2014-01-06 22:05:49 +01:00
fritz-hh
7eab052e0f Improved consistency of tmp file names 2014-01-06 22:00:58 +01:00
fritz-hh
552d19e36b v1.1-stable added in release notes 2014-01-06 20:09:46 +01:00