OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2025-08-18 13:42:12 +00:00

Author	SHA1	Message	Date
James R. Barlow	44f47fba21	PDF/A: handle case of no XMP metadata gracefully	2016-08-03 02:57:25 -07:00
James R. Barlow	02584094a1	Suppress NUL bytes in metadata from input files	2016-08-03 02:47:44 -07:00
James R. Barlow	91d715ac93	Add test cases for --output-type	2016-08-03 02:47:18 -07:00
James R. Barlow	35addb8a33	Complain if Chinese is requested with settings known to not work Should extend test for other Asian languages	2016-08-03 01:29:12 -07:00
James R. Barlow	d32ea8d0dd	Remove dead code from qpdf merge + PyPDF2 metadata patching I tried "qpdf merge + PyPDF2 metadata patching" first. The problem is that PyPDF2 produces a 1.3 by default and generally I have less confidence it. New approach is to stuff the Document Info metadata in the first page with PyPdf2, cross fingers and use qpdf to merge. It's not quite as clean and might harm the first page, but it's better than shipping files produced by PyPDF2.	2016-08-03 01:28:27 -07:00
James R. Barlow	12575d594a	Improve PDF/A validity checking at end	2016-08-03 01:26:16 -07:00
James R. Barlow	0746083301	Fix failing test case - unbound local variable in finally block	2016-08-03 01:00:38 -07:00
James R. Barlow	5c99acf6d1	Experimental change to use qpdf to merge files (disables Ghostscript) All but one tests pass, test_input_file_not_a_pdf Not sure if PyPDF2 metadata generation will mangle the first page.	2016-08-03 00:56:44 -07:00
James R. Barlow	2b10df7b74	leptonica: note about when it may be safe to drop <1.72 workaround	2016-08-03 00:54:37 -07:00
James R. Barlow	ebe68de4ff	Functional qpdfmerge with PyPDF2 for DocumentInfo block Tests mostly passing. For the moment this is the new default. Although PyPDF2 produces a PDF-1.3 which will be wrong for some contents and possible should be repaired with qpdf. Again. Looks like it could work better to merge PyPDF2 and fix everything with qpdf.	2016-08-02 16:48:13 -07:00
James R. Barlow	b17c6a146d	Experimental qpdf merging Does not copy /Catalog metadata, but otherwise functional	2016-08-02 02:19:02 -07:00
James R. Barlow	46d837c866	Clarify trusty/precise stuff	2016-08-02 01:29:33 -07:00
James R. Barlow	24856b61e4	Fix typo in readme	2016-08-02 01:29:22 -07:00
James R. Barlow	8d0c6ff616	pyvenv -> python3 -m venv Sadly the Python developers are removing this script	2016-08-02 01:27:50 -07:00
James R. Barlow	0b24f971cd	ocrmyimage: complain about ICC profiles being presumed	2016-08-02 01:22:36 -07:00
James R. Barlow	bc5d3824bd	Don't overload --oversample, use --image-dpi instead for images	2016-07-31 02:09:30 -07:00
James R. Barlow	4356983707	Suppress overly long stack traces on traverse_ruffus_exception	2016-07-31 02:06:44 -07:00
James R. Barlow	2414b79ee6	More cleanup of exception related errors	2016-07-31 01:48:13 -07:00
James R. Barlow	968e1546f0	Refactor image file triage	2016-07-31 01:47:57 -07:00
James R. Barlow	48213c9c3f	Update release notes and readme	2016-07-29 15:25:16 -07:00
James R. Barlow	f385772d21	Refactor "is this an iterable that's not a string?" test	2016-07-29 15:25:02 -07:00
James R. Barlow	d257c83520	Most tests were failing at split_pages() It seems that ruffus sometimes decides to send a ['inputfile.pdf'] instead of a bare string.	2016-07-29 14:59:17 -07:00
James R. Barlow	7b72ffec4f	ocrmyimage: better handling of missing/invalid DPI	2016-07-29 14:38:07 -07:00
James R. Barlow	757f6826dc	ocrmyimage - Attempt conversion to PDF if input file is not a PDF First cut. May have broken ruffus errors again too.	2016-07-29 14:03:19 -07:00
James R. Barlow	5df83a0d30	Travis: use Python 3.5 too	2016-07-29 13:31:40 -07:00
James R. Barlow	d70e3d3753	ruffus exceptions: for clarity only, don't iterate strings It's a good habit to ensure any iterator test is explicit about allowing or disallowing strings.	2016-07-29 13:31:24 -07:00
James R. Barlow	0dfceedcfb	Remove old OCRmyPDF 2.x from release notes; update 4.2 notes	2016-07-29 03:08:59 -07:00
James R. Barlow	2c30f4bfc5	Travis: build partly working on trusty; tweak requirements again The build is #122 https://travis-ci.org/jbarlow83/OCRmyPDF/builds/148255615 Errors seem to be related to either Ghostscript or leptonica? Maybe -dSAFER?	2016-07-29 03:08:01 -07:00
James R. Barlow	9e7fb52b47	Travis: add PPA to support unpaper	2016-07-29 01:57:12 -07:00
James R. Barlow	bb5fd38e38	Remove additional PPA's and try again	2016-07-29 01:47:56 -07:00
James R. Barlow	7c8cf5cfa2	Try travis-trusty This removes some backports for packages that Ubuntu trusty offers but for which Ubuntu precise needed help.	2016-07-29 01:44:57 -07:00
James R. Barlow	fef35e4eb2	Fix handling of DPI for rare case of JPEG recompression after deskew/clean This test is exercised by page 4 of multipage.pdf. If all images are JPEGs, and one of deskew/clean removes DPI information, make sure that we can get the right information back and that the DPI stays square.	2016-07-29 01:34:52 -07:00
James R. Barlow	8f77576dc4	Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1 Tesseract renderer not immediately fixable.	2016-07-28 16:43:51 -07:00
James R. Barlow	b3fcf24a26	Refactor DPI: fix regressions in test suite Some called functions are particular about the data format of DPI and don't like to deal with the Decimal() returned by PyPDF2. Convert to float and int where needed.	2016-07-28 00:19:32 -07:00
James R. Barlow	16e4d342d2	Bug fix: --force-ocr should still run on pages with no images Useful for people who want to reprocess text. This also requires --oversample because DPI is undefined. To be fixed in next commit.	2016-07-27 15:06:49 -07:00
James R. Barlow	8458a51860	Tighten requirements and dependencies	2016-07-27 14:47:59 -07:00
James R. Barlow	636d1903b3	Ghostscript: do raster output with -dSAFER -dSAFER does not work when rendering PDF/A, because that needs to load the ICC file, and -dSAFER prevents access to external files.	2016-07-27 00:54:40 -07:00
jbarlow83	514efa36fc	Readme: Add table of contents, brew install tesseract --with-language packs v4.1.4	2016-07-24 11:21:46 -07:00
James R. Barlow	bd48f40d3d	v4.1.4 release notes v4.1.4rc1	2016-07-17 00:35:06 -07:00
James R. Barlow	c02dbc809a	Merge commit '68cf9cbd87c188823027f9d1bfe9029017e7281f' into develop	2016-07-17 00:29:48 -07:00
James R. Barlow	410111d6fb	Bug fix: Monochrome images with ICC treated as full color images Issue #79. User submitted PDF with ICC profile attached to the monochrome image in the input file, which is not common but useful for PDFs that want to define how light the paper is or how dark the black is. The code was written to assume unusual images are full color unless it can prove otherwise. Handle this simple case. Other ICC cases should be tested.	2016-07-17 00:29:32 -07:00
jbarlow83	68cf9cbd87	.rst: add code-block markup	2016-07-05 14:03:55 -07:00
jbarlow83	c9b2540d9d	Fix some .rst formatting errors	2016-07-05 13:48:19 -07:00
jbarlow83	1bacf35a2c	Update license information for encrypted_algo4.pdf	2016-06-24 14:25:15 -07:00
jbarlow83	8aef0d9277	Merge pull request #76 from Jmuccigr/patch-2 Adding explicit reference to help	2016-06-24 14:21:23 -07:00
John Muccigrosso	b2fa8645ba	Adding explicit reference to help	2016-06-24 13:44:12 -05:00
James R. Barlow	c96823a648	v4.1.3 release notes v4.1.3 v4.1.3rc1	2016-06-23 13:47:56 -07:00
James R. Barlow	3807b7d655	Merge branch 'feature/leptfun' into develop	2016-06-23 13:45:35 -07:00
James R. Barlow	a45505cf1d	Fix order of operations in matrix multiplication Issue #73. The order of operations happens to not matter for scaling but does matter for translation. We only need scaling to find the DPI, so the error was not noticed. Mainly useful to other uses of this library.	2016-06-23 13:36:23 -07:00
James R. Barlow	b4a734fc0d	Test case for "algorithm 4" test Algorithm 4 -> PDF version 1.6	2016-06-23 13:21:26 -07:00

... 37 38 39 40 41 ...

2676 Commits