fritz-hh
50dee55606
File Test_Issue_#28 renamed
2013-11-27 22:30:43 +01:00
fritz-hh
da5cd01fe4
copyright line added
2013-05-06 23:13:29 +03:00
fritz-hh
d3fb317d41
readme updated
...
new feature: Process several pages in parallel if more than one CPU core
is available
2013-05-06 21:54:41 +02:00
fritz-hh
88ddeb1fb6
OCRmyPDF.sh: added dependency to GNU parallel
2013-05-06 21:54:05 +02:00
fritz-hh
f9e2e74bf3
Merge remote-tracking branch 'origin/v1.x' into v2.x
2013-05-06 21:35:34 +02:00
fritz-hh
87e01aff60
readme updated for v1.0-stable
v1.0-stable
2013-05-06 21:29:15 +02:00
fritz-hh
7e8481186a
OCRmyPDF.sh: metadata not added anymore
...
Removed feature to add metadata in final pdf file (because it lead to to
final PDF file that does not comply to the PDF/A-1 format)
2013-05-06 21:26:33 +02:00
fritz-hh
2b0103a4e6
basic implementation of parallel page processing
...
- basic implementation of parallel page processing using GNU parallel
- processing around 40% faster on dual core processor
2013-05-05 22:33:54 +02:00
fritz-hh
064d4be83c
Merge remote-tracking branch 'origin/v1.x' into v2.x
...
Conflicts:
OCRmyPDF.sh
Fixes #31
2013-05-05 21:01:17 +02:00
fritz-hh
ab536d5678
OCRmyPDF.sh: fixes issue for files having spaces
...
fixes #31
2013-05-05 20:56:45 +02:00
fritz-hh
9db805c4ad
new file to OCR one page
...
Required to perform OCR of several pages in parallal (using GNU
parallel)
2013-05-05 20:45:27 +02:00
fritz-hh
f7923a9761
OCRmyPDF.sh: few variables renamed for clarity
2013-05-05 20:44:03 +02:00
fritz-hh
fd52650255
.gitattribute: handle *.jar and *.pdf as binary
2013-05-05 16:54:41 +02:00
fritz-hh
f0fe295175
jhove config: fixes #29
2013-05-05 16:36:54 +02:00
fritz-hh
2f89aa3935
.gitignore corrected + jhove jar files added
...
.gitignore file corrected, because it prevented some required jhove
binary files from being checked in (jar files)
2013-05-04 22:01:03 +02:00
fritz-hh
5aa27343e0
delete test file
2013-05-04 21:58:01 +02:00
fritz-hh
5ce2841389
JHove: deleted doc + source
...
Deleted number of jhove files that are not required
(documentation and java source code mainly)
Goal: reduce size of the package
2013-05-04 21:55:39 +02:00
fritz-hh
e4ffb58269
OCRmyPDF.sh: provision for parallel pages processing
2013-05-02 22:06:16 +02:00
fritz-hh
2ce3d9e19d
added file to reproduce #28
2013-05-02 17:21:17 +02:00
fritz-hh
9271fe73a8
OCRmyPDF.sh: fixes #27
...
The fix should now be compatible to most implementation of grep
2013-05-02 16:51:46 +02:00
fritz-hh
edaa70b97f
OCRmyPDF.sh: fixes #25 and fixes #26
...
- In debug mode: compute and echo time required for processing
- Resolutions (x/y) that are nearly equal are not supported (because the
test did not take into account imprecision due to trauncation)
2013-05-01 15:58:55 +02:00
fritz-hh
ab07f4deea
OCRmyPDF.sh: handling of path with spaces
...
- corrected fct absolutePath() to handle path with spaces correctly
- pdf title metadata: split on case change file name
- change of owner/group/permission removed from code
- improved logging
2013-05-01 13:44:20 +02:00
fritz-hh
beb1d7ab54
release notes: updated for v1.0-rc2
v1.0-rc2
2013-04-29 12:27:43 +03:00
fritz-hh
5ce3e9bfec
OCRmyPDF.sh: Version number updated
2013-04-29 12:19:19 +03:00
fritz-hh
2bed210a30
OCRmyPDF.sh: added metadata in final pdf file
...
- added metadata in final pdf file: fixes #4
- improved logging of PDF/A validation results
2013-04-28 22:18:34 +02:00
fritz-hh
2441551156
OCRmyPDF.sh: final pdf same owner & permissions
...
fixes #9
2013-04-28 15:54:31 +02:00
fritz-hh
15baca5e08
HocrTransform.py: exist if page size if not found
...
fixes #21
2013-04-28 14:56:14 +02:00
fritz-hh
062ef0ca3a
OCRmyPDF.sh: keep tmp files in debug mode
...
fixes #22
2013-04-28 14:43:21 +02:00
fritz-hh
24b4686944
release notes: unpaper version added
2013-04-27 14:03:17 +03:00
fritz-hh
d3d1c20ca2
Correct version number
...
fixes #19
2013-04-27 14:00:59 +03:00
fritz-hh
7f7b81154f
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF
2013-04-26 19:37:26 +02:00
fritz-hh
6372cec6b8
OCRmyPDF.sh: Fixed major problem with deskew
...
After deskew the images was cropped to the wrong size
2013-04-26 19:37:02 +02:00
fritz-hh
5ec875325e
Update README.md
2013-04-26 18:46:18 +03:00
fritz-hh
4d80709cfd
Update README.md
2013-04-26 18:43:15 +03:00
fritz-hh
2642c1b3d3
Update README.md
2013-04-26 18:00:58 +03:00
fritz-hh
c4cd7e1982
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF
v1.0-rc1
2013-04-26 16:53:19 +02:00
fritz-hh
b993c158d0
OCRmyPDF: log msg corrected
2013-04-26 16:52:58 +02:00
fritz-hh
1b727042fe
release notes updated for v1.0-rc1
2013-04-26 17:50:54 +03:00
fritz-hh
ec26736577
folder structure cleaned
...
- put all src files (except OCRmyPDF.sh) to src
- rename tesseract_cfg to tess-cfg
2013-04-26 16:34:49 +02:00
fritz-hh
a766c5f2b7
typo
2013-04-26 16:19:18 +02:00
fritz-hh
3b2c804f23
Update README.md
2013-04-26 17:15:55 +03:00
fritz-hh
486ed6f217
Readme updated
2013-04-26 16:12:13 +02:00
fritz-hh
ae716a91cb
jhove paths corrected
2013-04-26 16:11:59 +02:00
fritz-hh
d4195b4362
jhove package added
2013-04-26 14:46:47 +02:00
fritz-hh
6ae0452d87
added readme
2013-04-26 14:28:04 +02:00
fritz-hh
815117f653
add test script
...
aimed at checking if the quality of the images drops quickly or not
2013-04-26 14:26:03 +02:00
fritz-hh
1c0eb03b3b
OCRmyPDF.sh: minor improvements
...
- additionnal data logged
- width/height were inverted: corrected
- few other minor changes
2013-04-26 14:20:45 +02:00
fritz-hh
3249fba4a2
OCRmyPDF.sh: log to stderr + check PDF/A profile
...
- fixes #10
- check not only if the final PDF is well formed and valid, but also if
it conforms to the PDF/A profile
2013-04-26 12:23:29 +02:00
fritz-hh
7c173dcc67
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF
2013-04-26 11:50:52 +02:00
fritz-hh
357f449e07
OCRmyPDF.sh: check if python is installed
...
- fixes #14
- minor other changes
2013-04-26 11:50:39 +02:00