James R. Barlow
2d15c09cca
Merge branch 'develop'
2016-02-06 18:18:49 -08:00
James R. Barlow
04cb8865b0
Fetch application from PyPI instead of local
...
setuptools_scm barfs because it can't find the version, because Docker hub
retrieves the application from Github in a way that omits the necessary
details.
I suppose there is a certain logic to Docker only using the tagged
released versions from PyPI, so go with it. The other attractive option
is to nix setuptools_scm.
2016-02-06 18:18:30 -08:00
James R. Barlow
6fe32bbaf7
v3.2.1
v3.2.1
2016-02-05 16:10:18 -08:00
James R. Barlow
4abb20390d
Bump Dockerfile versions
2016-02-05 16:08:26 -08:00
James R. Barlow
daa3916430
Fix img2pdf 0.2 usage
...
All tests pass when forced to rely on img2pdf, so seems okay
2016-02-05 15:13:26 -08:00
James R. Barlow
e9b87cefcc
Try img2pdf 0.2
2016-02-05 14:38:37 -08:00
James R. Barlow
60593b5ad3
Tighten up package requirements to deal with incompatible img2pdf 0.2 release
2016-02-05 14:37:05 -08:00
James R. Barlow
f708b11ea4
Fix Python 2.7 warning
2016-02-05 02:34:49 -08:00
James R. Barlow
7982f58b2e
Try tweaking Dockerfile for automated build again
v3.2.post2
2016-02-05 01:38:59 -08:00
James R. Barlow
e805c1908a
Minor fix for Dockerfile polyglot
v3.2.post1
2016-02-05 00:52:27 -08:00
James R. Barlow
cb3ba8e973
Merge branch 'release/v3.2' into develop
2016-02-05 00:10:41 -08:00
James R. Barlow
344fc40cbc
Merge branch 'release/v3.2'
v3.2
2016-02-05 00:10:41 -08:00
James R. Barlow
7e5c37137b
Merge branch 'develop' into release/v3.2
2016-02-04 23:42:06 -08:00
James R. Barlow
1aae11714b
Update release notes for v3.2
2016-02-04 23:41:33 -08:00
James R. Barlow
d82f14a7aa
Update .gitignore
2016-02-04 18:51:41 -08:00
James R. Barlow
4b65e0b093
Set JPEG output quality to 95 for better transcoding
2016-02-04 18:49:09 -08:00
James R. Barlow
43b0faa830
Bug in tesseract_noop spoof: produced wrong page sizes
...
Now checks input image to ensure the implied page size of its .hocr file
matches the rest of the PDF.
2016-02-04 18:48:22 -08:00
James R. Barlow
8674c9fb20
Merge commit 'ccfbb54e8c26784e438ba2fcac2179f21e7d857b' into release/v3.2
2016-02-04 17:39:36 -08:00
jbarlow83
ccfbb54e8c
Update release notes for v3.2
...
Fix the notes
2016-02-04 17:37:30 -08:00
James R. Barlow
9893ebf889
Suppress tesseract argument printout
2016-02-04 17:26:36 -08:00
James R. Barlow
303eb3e93a
Merge commit 'ca546d70e5bff9e9b115371f7813f3c326822bd8' into release/v3.2
2016-02-04 17:25:56 -08:00
jbarlow83
ca546d70e5
Merge pull request #45 from spwhitton/hocrtransform-shebang-fix
...
fix shebang in hocrtransform.py
2016-02-04 17:21:33 -08:00
Sean Whitton
6a5ea2d64a
fix shebang in hocrtransform.py
2016-02-03 17:48:35 -07:00
James R. Barlow
ec3d92ad8e
Reorg gitignore
2016-01-30 15:28:24 -08:00
James R. Barlow
66a095d7de
Improve organization of CFFI setup
2016-01-30 15:19:40 -08:00
James R. Barlow
411981efbc
Experiment with CFFI instead of ctypes
2016-01-30 15:06:25 -08:00
James R. Barlow
350ad5210e
Leptonica: convert to CFFI
2016-01-20 15:03:07 -08:00
James R. Barlow
f3b588764e
Suppress tesseract argument printout
2016-01-20 15:02:48 -08:00
James R. Barlow
b49f5a7d77
Support optionally using leptonica to deskew
...
unpaper doesn't seem to be good at deskewing. It fails on test case
with a lot of italics. I think it also struggles on pages with a lot
of whitespace. Leptonica continues to shine here.
However, this is only a first crack at Leptonica. The leptonica module
should be redone to use cffi (more extensible).
Also considering the possibility of making all Lept calls in a forked
process to insulate the calling process from C code crashes and the
messy redirect of stdout/stderr to read Leptonica's errors.
I don't think the redirect is a huge problem as long as multiprocesses
rather than multithreads are used. The ruffus child process that is
handling a page is single threaded and will not be affected by the
redirection. It just feels dirty. The main reason to consider a child
process is crash isolation.
2016-01-19 17:43:40 -08:00
James R. Barlow
bacbcba58a
Merge branch 'release/v3.2-rc1'
v3.2rc1
2016-01-19 16:58:37 -08:00
James R. Barlow
52e8aa434f
Update release notes for v3.2-rc1
2016-01-19 16:49:49 -08:00
James R. Barlow
37c508f3f8
Better versioning: no silly version files, but wrong ver in development
...
Small price to pay.
2016-01-19 16:07:52 -08:00
James R. Barlow
26e36422cc
More fiddling with version
2016-01-19 15:07:21 -08:00
James R. Barlow
f82cb002bc
Try automatic versioning with setuptools_scm
2016-01-19 13:27:18 -08:00
James R. Barlow
c1eb047a4b
Fix name of pdfa_def.ps
...
Used to include a copy of the parent dir's name.
2016-01-19 13:11:03 -08:00
James R. Barlow
626ca18f5c
Remove stale comment
2016-01-19 13:02:35 -08:00
James R. Barlow
9058dedfbe
New tests for ccitt, jbig2 encodings
2016-01-19 13:01:56 -08:00
James R. Barlow
a0952bfca3
Optimize: use img2pdf stream instead of repeated copies
2016-01-18 20:24:46 -08:00
James R. Barlow
354e61946e
Use os.makedirs for test output directories
...
Broke Travis
2016-01-16 02:47:56 -08:00
James R. Barlow
fd6d1d748a
Merge branch 'feature/pypdf-page-merge' into develop
2016-01-16 02:33:23 -08:00
James R. Barlow
360acd1e2c
Adjust test_oversample test case
...
Add -f to force generation of the background image at the desired
oversample resolution. Our new behavior is to only send the oversampled
image to Tesseract while leaving the main page intact unless asked to
deskew, clean, etc.
2016-01-15 15:55:23 -08:00
James R. Barlow
fc0479f110
Fix all but test_oversample[hocr]
2016-01-15 15:46:47 -08:00
James R. Barlow
62728205b6
Implement image+text merging in other cases
...
5 failed, 28 passed
failures:
test_oversample[hocr], test_skip_ocr, test_skip_big, test_maximum_options[hocr],
test_blank_input_pdf,
2016-01-15 15:38:08 -08:00
James R. Barlow
dc0fb25e64
Render hocr page: no longer needs an image as input
2016-01-15 15:16:47 -08:00
James R. Barlow
f3e04cce56
Update pipeline.svg
2016-01-15 14:56:16 -08:00
James R. Barlow
7067110308
Add safety check to prevent merge from running when not sensible
2016-01-15 14:54:45 -08:00
James R. Barlow
599d889703
Implement "perfect reconstruction" - transfer page and watermark OCR layer
...
Works, does not account for changes to clean/deskew, etc.
Surprisingly, it works. PyPDF2 fixes since last attempt?
2016-01-15 14:39:12 -08:00
James R. Barlow
2fa8366632
Merge branch 'feature/test-pageinfo-cleanup' into develop
2016-01-15 14:18:01 -08:00
James R. Barlow
c368c51bad
New hocrtransform test
2016-01-15 14:14:08 -08:00
James R. Barlow
7c558b3713
Move pageinfo test into tests folder
2016-01-11 17:40:44 -08:00