3727 Commits

Author SHA1 Message Date
James R. Barlow
f4d4ea46c8
Update artifact actions v16.0.1 2023-12-20 12:44:43 -08:00
James R. Barlow
2fd1a0f178
v16.0.1 release notes 2023-12-20 12:33:41 -08:00
James R. Barlow
73ed33a086
Tighten dependencies 2023-12-20 12:33:18 -08:00
James R. Barlow
e6095a9949
Fix text rendering issue with new hOCR text renderer 2023-12-20 12:26:06 -08:00
James R. Barlow
16f05af401
Fix release notes - drop rc from version 2023-12-18 20:08:45 -08:00
dependabot[bot]
1631afc878
Bump actions/setup-python from 4 to 5 (#1205)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
v16.0.0
2023-12-17 15:35:13 -08:00
Robin Richtsfeld
63d87fc440
Fix --fast-web-view documentation (#1206) 2023-12-17 14:54:38 -08:00
James R. Barlow
9489c01259
Skip test_encrypted on Py3.12 + macOS v16.0.0rc2 2023-12-08 00:12:24 -08:00
James R. Barlow
30d92ad83f
Fix build settings to adjust for dropping py39 2023-12-07 23:40:45 -08:00
James R. Barlow
a4987733c4
Filter rl_safe_eval deprecation warning
Full message
eportlab/lib/rl_safe_eval.py:11: DeprecationWarning: ast.NameConstant is deprecated and will be removed in Python 3.14; use ast.Constant instead
    haveNameConstant = hasattr(ast,'NameConstant')

Warning triggered by reportlab-4.0.7 and Python 3.12
2023-12-07 23:40:23 -08:00
James R. Barlow
39eee05230
v16.0.0rc1 release notes
Fixes #1009, #1191, #1157
v16.0.0rc1
2023-12-03 15:44:34 -08:00
James R. Barlow
5b2f2e6290
Merge branch 'feature/modernhocr' 2023-12-03 15:17:02 -08:00
James R. Barlow
445617a1a5
Rebuild cache for hocr default case 2023-12-03 15:16:18 -08:00
James R. Barlow
f6e90a5934
hOCR renderer is now default 2023-12-02 19:58:00 -08:00
James R. Barlow
43618e6b3f
Move canvas API to pikepdf and import it 2023-12-02 19:42:35 -08:00
James R. Barlow
e97f89de3b
Refactor font so glyphless isn't as hard coded 2023-12-02 08:55:01 -08:00
James R. Barlow
11d3e32f1e
Fix hocrtransform CLI 2023-12-02 08:08:29 -08:00
James R. Barlow
2affa83efe
Remove code that attempted to manage xattrs out of output file
Feature requested in issue #1179, but caused #1195. On further review,
there is no platform independent way to manage extended attributes
and it is not clear copying them through is necessarily the sensible
thing to do.

Closes #1179.
2023-11-29 23:25:51 -08:00
James R. Barlow
c90d5cd84b
Fix Ghostscript installation instructions and add warning v15.4.4 2023-11-29 14:10:04 -08:00
James R. Barlow
aacaba3d26
Ignore pypy for now 2023-11-21 01:05:23 -08:00
James R. Barlow
fec53be841
Remove next major release deprecations 2023-11-21 00:47:51 -08:00
James R. Barlow
3f7b540f76
Drop Python 3.9 support 2023-11-21 00:46:00 -08:00
James R. Barlow
d217856166
Make hocrdebug work, and try to handle CJK spacing better 2023-11-21 00:33:02 -08:00
James R. Barlow
e2be457e9b
Avoid divzero 2023-11-20 23:08:00 -08:00
James R. Barlow
4850f486d2
Make text API more like an accessor 2023-11-20 22:59:50 -08:00
James R. Barlow
729c7febd9
Fix placement of spaces in debug mode 2023-11-20 22:44:12 -08:00
James R. Barlow
6c6aca2f1e
Refactor save_state 2023-11-20 22:29:21 -08:00
James R. Barlow
c69823f496
Refactor; accumulate content stream as bytes rather than discrete pikepdf objects 2023-11-20 22:11:59 -08:00
James R. Barlow
73f8f6aac8
Add RTL output - seems to work, but debug does not 2023-11-20 20:28:07 -08:00
James R. Barlow
d944254e45
hocr: typing cont'd 2023-11-20 17:07:52 -08:00
James R. Barlow
f7ddffe554
hocr: typing 2023-11-20 16:52:55 -08:00
James R. Barlow
8a73ed5d5a
Fix JBIG2 not updating progress bar 2023-11-20 16:25:30 -08:00
James R. Barlow
03669183d7
Rationalize canvas interface 2023-11-20 15:54:13 -08:00
James R. Barlow
74e101a2fa
Improve canvas interface with chaining 2023-11-20 14:42:48 -08:00
James R. Barlow
532cf18ad3
Restructure hocrtransform submodule to avoid having everything in __init__ 2023-11-20 00:57:58 -08:00
James R. Barlow
0b90b697e2
More tidying 2023-11-20 00:43:43 -08:00
James R. Barlow
6be7c5f7c8
Fix colors and space box rendering 2023-11-20 00:30:54 -08:00
James R. Barlow
db2e5132e6
Remove some obsolete parameters 2023-11-20 00:10:55 -08:00
James R. Barlow
b14f6f778a
Tidying new hOCR renderer 2023-11-19 23:51:27 -08:00
James R. Barlow
415de77457
imageops: fix annots since not using singledispatch anymore 2023-11-19 23:51:27 -08:00
James R. Barlow
a9466c4f58
Improve word box positioning 2023-11-19 23:51:27 -08:00
James R. Barlow
d9ae453a63
Significantly improvement overall 2023-11-19 23:51:27 -08:00
James R. Barlow
9841e09233
More adjustments 2023-11-19 23:51:27 -08:00
James R. Barlow
0ca314e066
Replace Rect with pikepdf.Rectangle, migrate line matrix to page 2023-11-19 23:51:27 -08:00
James R. Barlow
d7680cae27
Correcting Matrix logic helps
The good: don't have to do inverse and intermediate transforms.

The bad: skew looks bad, partly because the hOCR coordinate system is inconsistent around skew?
2023-11-19 23:51:27 -08:00
James R. Barlow
491b6bdb1f
Remove concept of HOCR_OK_LANGS 2023-11-19 23:51:27 -08:00
James R. Barlow
c591f9601a
Remove Latin hOCR test 2023-11-19 23:51:27 -08:00
James R. Barlow
8d1e75017e
Remote reportlab backend and make reportlab a test-only dependency 2023-11-19 23:51:27 -08:00
James R. Barlow
94615f7ad4
hOCR now works for all languages 2023-11-19 23:51:27 -08:00
James R. Barlow
e5df8e1315
Nearly pixel perfect 2023-11-19 23:51:27 -08:00