25 Commits

Author SHA1 Message Date
James R. Barlow
6a4df78bc0 Add _naive_find_text to search for text when fitz is not available 2018-03-27 13:36:17 -07:00
James R. Barlow
6756016572 Add license notice to all files
Source files to GPL3

Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py

Test resources to CC BY-SA 4.0 except when otherwise noted.

Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
45c7bd9a60 lint: Remove shebangs from non-executable files 2018-02-24 12:38:58 -08:00
James R. Barlow
6ff6c8614f —output-type=pdf now outputs /UserUnit PDFs at the correct size
This currently distorts the output size because Tesseract assumes it
 knows the DPI better than we do.

Does not work for Ghostscript, because it emerges that Ghostscript
honors /UserUnit for rasterizing but not in pdfwrite (resolve/wontfix).

https://bugs.ghostscript.com/show_bug.cgi?id=690781

Ghostscript’s output would need to be patched in a PDF/A safe way for
this to work. Temporary route may be to block Ghostscript if
/UserUnit.
2017-05-24 23:26:07 -07:00
James R. Barlow
d9005a1074 pdfinfo: replace most remaining dict-style access 2017-05-19 16:17:36 -07:00
James R. Barlow
08e47117a3 Rename pageinfo to pdfinfo 2017-05-19 15:48:23 -07:00
James R. Barlow
8694f8d2eb Replace magic strings colorspace and encoding with Enums 2017-05-18 22:32:27 -07:00
James R. Barlow
56d2aae963 Refactor from ImageInfo index to attribute accessing 2017-05-18 18:39:14 -07:00
James R. Barlow
caee5b1428 Access PageInfo instance variables instead of dictionary 2017-05-18 17:12:04 -07:00
James R. Barlow
cd04ae6949 Refactor PdfInfo(str(filename)) -> PdfInfo(filename) 2017-05-18 16:43:50 -07:00
James R. Barlow
6a0b68298f Refactor pdf_get_all_pageinfo to PdfInfo 2017-05-18 16:31:18 -07:00
James R. Barlow
96045e98f4 Update develop with master changes
We’re well out of the “trivial updates” zone
2017-05-11 22:54:27 -07:00
James R. Barlow
aa859a4139 Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents 2017-05-01 15:46:15 -07:00
James R. Barlow
89599b4812 Drop Python 3.4 compatibility 2017-03-29 15:46:53 -07:00
James R. Barlow
d1a0065ef8 Create test case for Form XObjects 2017-02-14 12:51:15 -08:00
James R. Barlow
b889a89c36 Fix remaining 3.4/3.5 regressions 2017-01-26 17:53:27 -08:00
James R. Barlow
02fba02d31 Refactor test suite to use fixtures to manage paths 2017-01-26 16:38:59 -08:00
James R. Barlow
fb9e7c82f6 Move duplicate test code into common namespace 2017-01-26 13:36:52 -08:00
James R. Barlow
1c8b763d53 test_pageinfo: Remove bits per component test
The behavior of this test will ultimately depend on what version of
img2pdf is installed, since after my patch it will be able to produce
1bpp images.
2016-11-07 14:35:54 -08:00
James R. Barlow
570bbe9a05 Add comments and remove debugging, improve inline handling
Squashed commits:
[bfff3c9] pageinfo, have a main()
2016-02-27 00:18:36 -08:00
James R. Barlow
5cc3adb39a Add support for inline images 2016-02-27 00:18:36 -08:00
James R. Barlow
3957a0606c Compute image pixel density without performing rectangle intersection (+5 squashed commits)
Squashed commits:
[0e27904] Partially implement DPI calculation with rotation of the image

Fixes test suite
[a64f662] pageinfo: all tests pass
[c5b811a] Fix typos
[cdd2286] Can now find inline images for efficiently
[60dde8d] First cut at implementing intelligent DPI detection based on content stream

Broke many of the test cases
2016-02-27 00:18:36 -08:00
James R. Barlow
0dc96442d8 Fix img2pdf usage in test case (to make Travis CI happy again) 2016-02-06 23:41:32 -08:00
James R. Barlow
354e61946e Use os.makedirs for test output directories
Broke Travis
2016-01-16 02:47:56 -08:00
James R. Barlow
7c558b3713 Move pageinfo test into tests folder 2016-01-11 17:40:44 -08:00