OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2025-08-03 06:12:30 +00:00

Author	SHA1	Message	Date
James R. Barlow	8982b3e1e2	Update requirements -update requirements.txt and dev_requirements.txt to more recent version -setup.py updated to Ubuntu 14.04 rather than 12.04 backports -request at least Pillow 3.1.1 now (since this makes jpeg/png mandatory)	2016-12-03 14:14:07 -08:00
James R. Barlow	245f05d5f4	docs: allow python setup.py install --force to bypass checks ReadTheDocs needs this.	2016-10-28 00:07:26 -07:00
James R. Barlow	bd534c3313	main.py -> __main__.py Executing a package with python -m packagename will check for __main__.py inside the package. In other words main.py should have always been named __main__.py. In the unlikely event that someone depends on "import ocrmypdf.main" being meaningful, main.py continues to exist and replicates the behavior of __main__. (It's unlikely because import ocrmypdf.main does unpythonic ruffus-related things at things import time, essentially configuring itself to work with sys.argv. To fix another day.) This should solve the problem of Debian needing to run test suites before installation and afterwards for continuous integration without having to patch either file, as python -m ocrmypdf will follow import order. That is, if the current directory contains "ocrmypdf/" (e.g. staging a new version) then that will be tested, else sys.path will be checked.	2016-08-31 17:01:42 -07:00
James R. Barlow	1b7b2f3695	v4.2.2 release notes, documentation improvements	2016-08-25 14:46:09 -07:00
James R. Barlow	b03028e31f	setup.py -> license is MIT	2016-08-19 10:14:33 -07:00
James R. Barlow	2c30f4bfc5	Travis: build partly working on trusty; tweak requirements again The build is #122 https://travis-ci.org/jbarlow83/OCRmyPDF/builds/148255615 Errors seem to be related to either Ghostscript or leptonica? Maybe -dSAFER?	2016-07-29 03:08:01 -07:00
James R. Barlow	8458a51860	Tighten requirements and dependencies	2016-07-27 14:47:59 -07:00
James R. Barlow	b964999427	Update filename references from sRGB_IEC to sRGB	2016-05-10 21:58:04 -07:00
James R. Barlow	fe14cb57c0	Fix ruffus exception output I found this issue in ruffus 2.6.3 https://github.com/bunbun/ruffus/issues/65 also discussed here https://github.com/bunbun/ruffus/pull/67 ruffus 2.6.3 RethrownJobError don't follow the normal conventions and so its exception causes problems when they cross process boundaries. This change carefully examines the various forms of ruffus exception objects that can appear in 2.6.3 and parses them more carefully. It also removes any direct posting of the exception to the logger because this triggers another serializing of the exception object, mutating it further.	2016-04-28 00:38:50 -07:00
James R. Barlow	368252a243	setuptools_scm_git_archive seems suddenly broken	2016-03-01 02:09:45 -08:00
James R. Barlow	3d0e8c9629	Provide our own sRGB profile instead of Ghostscript's	2016-03-01 01:27:40 -08:00
James R. Barlow	71d616e413	Restore Dockerfile on local and probably on automated build as well	2016-02-17 00:13:45 -08:00
James R. Barlow	a87aa71d85	Remove old documentation about Pillow not linking jpeg, zlib As of Pillow 3.0.0 this is fixed, so make Pillow 3 a requirement	2016-02-16 14:29:31 -08:00
James R. Barlow	35b1ca2be2	Travis: try replacing non-standard invocation of py.test It seems the normal thing to wire up python setup.py test to invoke the test suite rather than py.test. This may be the reason for the past chain of cffi-related commits.	2016-02-16 05:36:14 -08:00
James R. Barlow	8cd84afac8	Revert "Try moving leptonica build script, playing with wheels a bit" This reverts commit ec2c6c312bc7e64c25b26563e9093d89ea1b9032.	2016-02-16 05:04:20 -08:00
James R. Barlow	ec2c6c312b	Try moving leptonica build script, playing with wheels a bit	2016-02-16 04:05:58 -08:00
James R. Barlow	2752bda80b	Merge branch 'feature/leptdeskew' into feature/logging Need leptonica for testing now, I think # Conflicts: # ocrmypdf/tesseract.py # requirements.txt # setup.py	2016-02-08 12:34:48 -08:00
James R. Barlow	2d15c09cca	Merge branch 'develop'	2016-02-06 18:18:49 -08:00
James R. Barlow	e9b87cefcc	Try img2pdf 0.2	2016-02-05 14:38:37 -08:00
James R. Barlow	60593b5ad3	Tighten up package requirements to deal with incompatible img2pdf 0.2 release	2016-02-05 14:37:05 -08:00
James R. Barlow	f708b11ea4	Fix Python 2.7 warning	2016-02-05 02:34:49 -08:00
James R. Barlow	66a095d7de	Improve organization of CFFI setup	2016-01-30 15:19:40 -08:00
James R. Barlow	350ad5210e	Leptonica: convert to CFFI	2016-01-20 15:03:07 -08:00
James R. Barlow	37c508f3f8	Better versioning: no silly version files, but wrong ver in development Small price to pay.	2016-01-19 16:07:52 -08:00
James R. Barlow	26e36422cc	More fiddling with version	2016-01-19 15:07:21 -08:00
James R. Barlow	f82cb002bc	Try automatic versioning with setuptools_scm	2016-01-19 13:27:18 -08:00
James R. Barlow	6af0815681	Bump version	2016-01-09 18:45:06 -08:00
James R. Barlow	424b4b33b1	Just go right ahead and demand Python 3.4	2016-01-04 12:56:51 -08:00
James R. Barlow	e510f89792	Python 2 warning message	2015-12-21 09:38:38 -08:00
James R. Barlow	79b3472b26	All tests passed, bump version	2015-12-04 04:31:01 -08:00
James R. Barlow	281eafada0	bump to v3.0 and move repos	2015-09-05 00:53:14 -07:00
James R. Barlow	c14e10128a	Bump version to -rc9	2015-08-29 16:43:22 -07:00
James R. Barlow	2ce6834be4	Bump to -rc8	2015-08-24 01:25:01 -07:00
James R. Barlow	aab08bfcc7	Fix requirements.txt problem	2015-08-23 12:30:40 -07:00
James R. Barlow	ee7f008ff5	Require unpaper 6.1; no messing around with broken versions	2015-08-22 01:51:08 -07:00
James R. Barlow	4f3673d14d	Update notes for -rc6	2015-08-22 00:40:07 -07:00
James R. Barlow	9dad40b5a3	Major overhaul of the Dockerfile Switched from Ubuntu to debian:stretch because stretch has more recent versions of our binary packages and starts smaller. In particular, stretch has both pillow==2.9.0 and reportlab==3.2.0 available as system packages which saves the considerable hassle of install a toolchain. Instead, a pyvenv is set up with access to system's site-packages (note: needs two steps), making the binary-dependent packages available. Then the remaining packages are installed into the pyvenv with --no-cache-dir to avoid saving files. And there we are. Image is still very large (>500 MB), but programs like reportlab require font rendering capabilities so they pull in large portions of the Linux graphics stack. Not much will shrink that.	2015-08-20 01:25:31 -07:00
James R. Barlow	8e2d690cb0	Rework Dockerfile, setup.py to work with wheels for better cache use	2015-08-19 13:43:32 -07:00
James R. Barlow	2dff3e07ce	Drop libxml2 dependency It seems that Python's internal XML parser is good enough to do the job.	2015-08-17 15:26:07 -07:00
James R. Barlow	53c88093ad	Bump to -rc5	2015-08-16 02:19:04 -07:00
James R. Barlow	30072e0c70	Pillow sucks Far from being fluffy or friendly, Pillow silently allows installation of itself without support for major image types. Reportlab calls for pillow 2.4.0. On Ubuntu 14.04 LTS this will trigger an upgrade of pillow that will be built without JPEG or ZLIB so it is effectively neutered, and unfortunately Pillow will not detect this situation at install time and guide users to a resolution. Instead, you see nasty stack traces. So add a run-time check to ensure that Pillow is sane and capable of JPEG and PNG support since both may be used internally.	2015-08-16 00:54:03 -07:00
James R. Barlow	eb04a890b2	Relax Pillow requirement for Ubuntu 14.04 LTS	2015-08-15 15:55:56 -07:00
James R. Barlow	0c53adb04f	setup: rollback lxml version to 3.3.3 - that's the latest in Ubuntu 14.04	2015-08-15 15:25:58 -07:00
James R. Barlow	87aeeacb04	Fix erroneous instruction to "apt-get install tesseract" Should be tesseract-ocr	2015-08-15 15:17:38 -07:00
James R. Barlow	f6f4705ea3	Remove Java from setup.py	2015-08-14 00:44:56 -07:00
James R. Barlow	11dd9f14c3	setup.py: block unsafe 'upload', say to use twine instead	2015-08-09 14:16:30 -07:00
James R. Barlow	16d24f1166	Bump version to -rc4	2015-08-05 23:26:38 -07:00
James R. Barlow	a036de318e	Replace mupdf and poppler with qpdf Drop two dependencies and replace them with one that does the job of both. Smells like progress. mupdf does PDF file repair and rendering poppler does rendering and page splitting qpdf does PDF file repair and page splitting ghostscript does PDF file repair, rendering, and page splitting (sort of) So we use qpdf. Ghostscript's page splitting is supposed is less efficient because it reprints the page (PDF -> Postscript -> PDF) and possibly loses quality. qpdf's library could be used to improve performance. This causes a slight performance regression: py.test tests/test_main.py::test_maximum_options went from 187 seconds up to 192. This is likely due to O(n) serialized invocations of qpdf compared to a single serialized call to pdfseparate. Could improve on this situation by using the example code in qpdf: pdf-split-pages.cc or create marker files in split_pages() and then write a new @transform function that would split pages on each CPU. Probably not worth it, overall, unless this causes problems on files with hundreds of pages.	2015-07-30 04:16:35 -07:00
James R. Barlow	9918c4020e	Use img2pdf in test case because it does a better job	2015-07-30 03:35:56 -07:00
James R. Barlow	47e50f82c4	setup.py: allow mutool 1.7	2015-07-28 13:37:32 -07:00

1 2

60 Commits