60 Commits

Author SHA1 Message Date
James R. Barlow
8982b3e1e2 Update requirements
-update requirements.txt and dev_requirements.txt to more recent version
-setup.py updated to Ubuntu 14.04 rather than 12.04 backports
-request at least Pillow 3.1.1 now (since this makes jpeg/png mandatory)
2016-12-03 14:14:07 -08:00
James R. Barlow
245f05d5f4 docs: allow python setup.py install --force to bypass checks
ReadTheDocs needs this.
2016-10-28 00:07:26 -07:00
James R. Barlow
bd534c3313 main.py -> __main__.py
Executing a package with python -m packagename will check for
__main__.py inside the package.  In other words main.py should have
always been named __main__.py.

In the unlikely event that someone depends on "import ocrmypdf.main"
being meaningful, main.py continues to exist and replicates the
behavior of __main__.  (It's unlikely because import ocrmypdf.main does
unpythonic ruffus-related things at things import time, essentially
configuring itself to work with sys.argv.  To fix another day.)

This should solve the problem of Debian needing to run test suites
before installation and afterwards for continuous integration without
having to patch either file, as python -m ocrmypdf will follow import
order.  That is, if the current directory contains "ocrmypdf/" (e.g.
staging a new version) then that will be tested, else sys.path will
be checked.
2016-08-31 17:01:42 -07:00
James R. Barlow
1b7b2f3695 v4.2.2 release notes, documentation improvements 2016-08-25 14:46:09 -07:00
James R. Barlow
b03028e31f setup.py -> license is MIT 2016-08-19 10:14:33 -07:00
James R. Barlow
2c30f4bfc5 Travis: build partly working on trusty; tweak requirements again
The build is #122
https://travis-ci.org/jbarlow83/OCRmyPDF/builds/148255615

Errors seem to be related to either Ghostscript or leptonica? Maybe
-dSAFER?
2016-07-29 03:08:01 -07:00
James R. Barlow
8458a51860 Tighten requirements and dependencies 2016-07-27 14:47:59 -07:00
James R. Barlow
b964999427 Update filename references from sRGB_IEC to sRGB 2016-05-10 21:58:04 -07:00
James R. Barlow
fe14cb57c0 Fix ruffus exception output
I found this issue in ruffus 2.6.3
https://github.com/bunbun/ruffus/issues/65
also discussed here
https://github.com/bunbun/ruffus/pull/67

ruffus 2.6.3 RethrownJobError don't follow the normal conventions and
so its exception causes problems when they cross process boundaries.
This change carefully examines the various forms of ruffus exception
objects that can appear in 2.6.3 and parses them more carefully. It
also removes any direct posting of the exception to the logger because
this triggers another serializing of the exception object, mutating it
further.
2016-04-28 00:38:50 -07:00
James R. Barlow
368252a243 setuptools_scm_git_archive seems suddenly broken 2016-03-01 02:09:45 -08:00
James R. Barlow
3d0e8c9629 Provide our own sRGB profile instead of Ghostscript's 2016-03-01 01:27:40 -08:00
James R. Barlow
71d616e413 Restore Dockerfile on local and probably on automated build as well 2016-02-17 00:13:45 -08:00
James R. Barlow
a87aa71d85 Remove old documentation about Pillow not linking jpeg, zlib
As of Pillow 3.0.0 this is fixed, so make Pillow 3 a requirement
2016-02-16 14:29:31 -08:00
James R. Barlow
35b1ca2be2 Travis: try replacing non-standard invocation of py.test
It seems the normal thing to wire up python setup.py test to invoke
the test suite rather than py.test. This may be the reason for the
past chain of cffi-related commits.
2016-02-16 05:36:14 -08:00
James R. Barlow
8cd84afac8 Revert "Try moving leptonica build script, playing with wheels a bit"
This reverts commit ec2c6c312bc7e64c25b26563e9093d89ea1b9032.
2016-02-16 05:04:20 -08:00
James R. Barlow
ec2c6c312b Try moving leptonica build script, playing with wheels a bit 2016-02-16 04:05:58 -08:00
James R. Barlow
2752bda80b Merge branch 'feature/leptdeskew' into feature/logging
Need leptonica for testing now, I think
# Conflicts:
#	ocrmypdf/tesseract.py
#	requirements.txt
#	setup.py
2016-02-08 12:34:48 -08:00
James R. Barlow
2d15c09cca Merge branch 'develop' 2016-02-06 18:18:49 -08:00
James R. Barlow
e9b87cefcc Try img2pdf 0.2 2016-02-05 14:38:37 -08:00
James R. Barlow
60593b5ad3 Tighten up package requirements to deal with incompatible img2pdf 0.2 release 2016-02-05 14:37:05 -08:00
James R. Barlow
f708b11ea4 Fix Python 2.7 warning 2016-02-05 02:34:49 -08:00
James R. Barlow
66a095d7de Improve organization of CFFI setup 2016-01-30 15:19:40 -08:00
James R. Barlow
350ad5210e Leptonica: convert to CFFI 2016-01-20 15:03:07 -08:00
James R. Barlow
37c508f3f8 Better versioning: no silly version files, but wrong ver in development
Small price to pay.
2016-01-19 16:07:52 -08:00
James R. Barlow
26e36422cc More fiddling with version 2016-01-19 15:07:21 -08:00
James R. Barlow
f82cb002bc Try automatic versioning with setuptools_scm 2016-01-19 13:27:18 -08:00
James R. Barlow
6af0815681 Bump version 2016-01-09 18:45:06 -08:00
James R. Barlow
424b4b33b1 Just go right ahead and demand Python 3.4 2016-01-04 12:56:51 -08:00
James R. Barlow
e510f89792 Python 2 warning message 2015-12-21 09:38:38 -08:00
James R. Barlow
79b3472b26 All tests passed, bump version 2015-12-04 04:31:01 -08:00
James R. Barlow
281eafada0 bump to v3.0 and move repos 2015-09-05 00:53:14 -07:00
James R. Barlow
c14e10128a Bump version to -rc9 2015-08-29 16:43:22 -07:00
James R. Barlow
2ce6834be4 Bump to -rc8 2015-08-24 01:25:01 -07:00
James R. Barlow
aab08bfcc7 Fix requirements.txt problem 2015-08-23 12:30:40 -07:00
James R. Barlow
ee7f008ff5 Require unpaper 6.1; no messing around with broken versions 2015-08-22 01:51:08 -07:00
James R. Barlow
4f3673d14d Update notes for -rc6 2015-08-22 00:40:07 -07:00
James R. Barlow
9dad40b5a3 Major overhaul of the Dockerfile
Switched from Ubuntu to debian:stretch because stretch has more recent
versions of our binary packages and starts smaller.  In particular,
stretch has both pillow==2.9.0 and reportlab==3.2.0 available as system
packages which saves the considerable hassle of install a toolchain.

Instead, a pyvenv is set up with access to system's site-packages (note:
needs two steps), making the binary-dependent packages available.  Then
the remaining packages are installed into the pyvenv with --no-cache-dir
to avoid saving files. And there we are.

Image is still very large (>500 MB), but programs like reportlab require
font rendering capabilities so they pull in large portions of the Linux
graphics stack. Not much will shrink that.
2015-08-20 01:25:31 -07:00
James R. Barlow
8e2d690cb0 Rework Dockerfile, setup.py to work with wheels for better cache use 2015-08-19 13:43:32 -07:00
James R. Barlow
2dff3e07ce Drop libxml2 dependency
It seems that Python's internal XML parser is good enough to do the job.
2015-08-17 15:26:07 -07:00
James R. Barlow
53c88093ad Bump to -rc5 2015-08-16 02:19:04 -07:00
James R. Barlow
30072e0c70 Pillow sucks
Far from being fluffy or friendly, Pillow silently allows installation
of itself without support for major image types.  Reportlab calls for
pillow 2.4.0.  On Ubuntu 14.04 LTS this will trigger an upgrade of
pillow that will be built without JPEG or ZLIB so it is effectively
neutered, and unfortunately Pillow will not detect this situation at
install time and guide users to a resolution.  Instead, you see nasty
stack traces.

So add a run-time check to ensure that Pillow is sane and capable of JPEG
and PNG support since both may be used internally.
2015-08-16 00:54:03 -07:00
James R. Barlow
eb04a890b2 Relax Pillow requirement for Ubuntu 14.04 LTS 2015-08-15 15:55:56 -07:00
James R. Barlow
0c53adb04f setup: rollback lxml version to 3.3.3 - that's the latest in Ubuntu 14.04 2015-08-15 15:25:58 -07:00
James R. Barlow
87aeeacb04 Fix erroneous instruction to "apt-get install tesseract"
Should be tesseract-ocr
2015-08-15 15:17:38 -07:00
James R. Barlow
f6f4705ea3 Remove Java from setup.py 2015-08-14 00:44:56 -07:00
James R. Barlow
11dd9f14c3 setup.py: block unsafe 'upload', say to use twine instead 2015-08-09 14:16:30 -07:00
James R. Barlow
16d24f1166 Bump version to -rc4 2015-08-05 23:26:38 -07:00
James R. Barlow
a036de318e Replace mupdf and poppler with qpdf
Drop two dependencies and replace them with one that does the job of
both.  Smells like progress.

mupdf does PDF file repair and rendering
poppler does rendering and page splitting
qpdf does PDF file repair and page splitting
ghostscript does PDF file repair, rendering, and page splitting (sort of)

So we use qpdf.  Ghostscript's page splitting is supposed is less
efficient because it reprints the page (PDF -> Postscript -> PDF) and
possibly loses quality.  qpdf's library could be used to improve
performance.

This causes a slight performance regression:

py.test tests/test_main.py::test_maximum_options went from 187 seconds
up to 192.  This is likely due to O(n) serialized invocations of qpdf
compared to a single serialized call to pdfseparate.  Could improve on
this situation by using the example code in qpdf: pdf-split-pages.cc
or create marker files in split_pages() and then write a new @transform
function that would split pages on each CPU.  Probably not worth it,
overall, unless this causes problems on files with hundreds of pages.
2015-07-30 04:16:35 -07:00
James R. Barlow
9918c4020e Use img2pdf in test case because it does a better job 2015-07-30 03:35:56 -07:00
James R. Barlow
47e50f82c4 setup.py: allow mutool 1.7 2015-07-28 13:37:32 -07:00