OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2025-11-17 10:34:48 +00:00

Author	SHA1	Message	Date
James R. Barlow	2d15c09cca	Merge branch 'develop'	2016-02-06 18:18:49 -08:00
James R. Barlow	04cb8865b0	Fetch application from PyPI instead of local setuptools_scm barfs because it can't find the version, because Docker hub retrieves the application from Github in a way that omits the necessary details. I suppose there is a certain logic to Docker only using the tagged released versions from PyPI, so go with it. The other attractive option is to nix setuptools_scm.	2016-02-06 18:18:30 -08:00
James R. Barlow	6fe32bbaf7	v3.2.1 v3.2.1	2016-02-05 16:10:18 -08:00
James R. Barlow	4abb20390d	Bump Dockerfile versions	2016-02-05 16:08:26 -08:00
James R. Barlow	daa3916430	Fix img2pdf 0.2 usage All tests pass when forced to rely on img2pdf, so seems okay	2016-02-05 15:13:26 -08:00
James R. Barlow	e9b87cefcc	Try img2pdf 0.2	2016-02-05 14:38:37 -08:00
James R. Barlow	60593b5ad3	Tighten up package requirements to deal with incompatible img2pdf 0.2 release	2016-02-05 14:37:05 -08:00
James R. Barlow	f708b11ea4	Fix Python 2.7 warning	2016-02-05 02:34:49 -08:00
James R. Barlow	7982f58b2e	Try tweaking Dockerfile for automated build again v3.2.post2	2016-02-05 01:38:59 -08:00
James R. Barlow	e805c1908a	Minor fix for Dockerfile polyglot v3.2.post1	2016-02-05 00:52:27 -08:00
James R. Barlow	cb3ba8e973	Merge branch 'release/v3.2' into develop	2016-02-05 00:10:41 -08:00
James R. Barlow	344fc40cbc	Merge branch 'release/v3.2' v3.2	2016-02-05 00:10:41 -08:00
James R. Barlow	7e5c37137b	Merge branch 'develop' into release/v3.2	2016-02-04 23:42:06 -08:00
James R. Barlow	1aae11714b	Update release notes for v3.2	2016-02-04 23:41:33 -08:00
James R. Barlow	d82f14a7aa	Update .gitignore	2016-02-04 18:51:41 -08:00
James R. Barlow	4b65e0b093	Set JPEG output quality to 95 for better transcoding	2016-02-04 18:49:09 -08:00
James R. Barlow	43b0faa830	Bug in tesseract_noop spoof: produced wrong page sizes Now checks input image to ensure the implied page size of its .hocr file matches the rest of the PDF.	2016-02-04 18:48:22 -08:00
James R. Barlow	8674c9fb20	Merge commit 'ccfbb54e8c26784e438ba2fcac2179f21e7d857b' into release/v3.2	2016-02-04 17:39:36 -08:00
jbarlow83	ccfbb54e8c	Update release notes for v3.2 Fix the notes	2016-02-04 17:37:30 -08:00
James R. Barlow	9893ebf889	Suppress tesseract argument printout	2016-02-04 17:26:36 -08:00
James R. Barlow	303eb3e93a	Merge commit 'ca546d70e5bff9e9b115371f7813f3c326822bd8' into release/v3.2	2016-02-04 17:25:56 -08:00
jbarlow83	ca546d70e5	Merge pull request #45 from spwhitton/hocrtransform-shebang-fix fix shebang in hocrtransform.py	2016-02-04 17:21:33 -08:00
Sean Whitton	6a5ea2d64a	fix shebang in hocrtransform.py	2016-02-03 17:48:35 -07:00
James R. Barlow	ec3d92ad8e	Reorg gitignore	2016-01-30 15:28:24 -08:00
James R. Barlow	66a095d7de	Improve organization of CFFI setup	2016-01-30 15:19:40 -08:00
James R. Barlow	411981efbc	Experiment with CFFI instead of ctypes	2016-01-30 15:06:25 -08:00
James R. Barlow	350ad5210e	Leptonica: convert to CFFI	2016-01-20 15:03:07 -08:00
James R. Barlow	f3b588764e	Suppress tesseract argument printout	2016-01-20 15:02:48 -08:00
James R. Barlow	b49f5a7d77	Support optionally using leptonica to deskew unpaper doesn't seem to be good at deskewing. It fails on test case with a lot of italics. I think it also struggles on pages with a lot of whitespace. Leptonica continues to shine here. However, this is only a first crack at Leptonica. The leptonica module should be redone to use cffi (more extensible). Also considering the possibility of making all Lept calls in a forked process to insulate the calling process from C code crashes and the messy redirect of stdout/stderr to read Leptonica's errors. I don't think the redirect is a huge problem as long as multiprocesses rather than multithreads are used. The ruffus child process that is handling a page is single threaded and will not be affected by the redirection. It just feels dirty. The main reason to consider a child process is crash isolation.	2016-01-19 17:43:40 -08:00
James R. Barlow	bacbcba58a	Merge branch 'release/v3.2-rc1' v3.2rc1	2016-01-19 16:58:37 -08:00
James R. Barlow	52e8aa434f	Update release notes for v3.2-rc1	2016-01-19 16:49:49 -08:00
James R. Barlow	37c508f3f8	Better versioning: no silly version files, but wrong ver in development Small price to pay.	2016-01-19 16:07:52 -08:00
James R. Barlow	26e36422cc	More fiddling with version	2016-01-19 15:07:21 -08:00
James R. Barlow	f82cb002bc	Try automatic versioning with setuptools_scm	2016-01-19 13:27:18 -08:00
James R. Barlow	c1eb047a4b	Fix name of pdfa_def.ps Used to include a copy of the parent dir's name.	2016-01-19 13:11:03 -08:00
James R. Barlow	626ca18f5c	Remove stale comment	2016-01-19 13:02:35 -08:00
James R. Barlow	9058dedfbe	New tests for ccitt, jbig2 encodings	2016-01-19 13:01:56 -08:00
James R. Barlow	a0952bfca3	Optimize: use img2pdf stream instead of repeated copies	2016-01-18 20:24:46 -08:00
James R. Barlow	354e61946e	Use os.makedirs for test output directories Broke Travis	2016-01-16 02:47:56 -08:00
James R. Barlow	fd6d1d748a	Merge branch 'feature/pypdf-page-merge' into develop	2016-01-16 02:33:23 -08:00
James R. Barlow	360acd1e2c	Adjust test_oversample test case Add -f to force generation of the background image at the desired oversample resolution. Our new behavior is to only send the oversampled image to Tesseract while leaving the main page intact unless asked to deskew, clean, etc.	2016-01-15 15:55:23 -08:00
James R. Barlow	fc0479f110	Fix all but test_oversample[hocr]	2016-01-15 15:46:47 -08:00
James R. Barlow	62728205b6	Implement image+text merging in other cases 5 failed, 28 passed failures: test_oversample[hocr], test_skip_ocr, test_skip_big, test_maximum_options[hocr], test_blank_input_pdf,	2016-01-15 15:38:08 -08:00
James R. Barlow	dc0fb25e64	Render hocr page: no longer needs an image as input	2016-01-15 15:16:47 -08:00
James R. Barlow	f3e04cce56	Update pipeline.svg	2016-01-15 14:56:16 -08:00
James R. Barlow	7067110308	Add safety check to prevent merge from running when not sensible	2016-01-15 14:54:45 -08:00
James R. Barlow	599d889703	Implement "perfect reconstruction" - transfer page and watermark OCR layer Works, does not account for changes to clean/deskew, etc. Surprisingly, it works. PyPDF2 fixes since last attempt?	2016-01-15 14:39:12 -08:00
James R. Barlow	2fa8366632	Merge branch 'feature/test-pageinfo-cleanup' into develop	2016-01-15 14:18:01 -08:00
James R. Barlow	c368c51bad	New hocrtransform test	2016-01-15 14:14:08 -08:00
James R. Barlow	7c558b3713	Move pageinfo test into tests folder	2016-01-11 17:40:44 -08:00

... 41 42 43 44 45 ...

2676 Commits