3 Commits

Author SHA1 Message Date
Matt Robinson
339c133326
fix: cleanup from live .docx tests (#177)
* add env var for cap threshold; raise default threshold

* update docs and tests

* added check for ending in a comma

* update docs

* no caps check for all upper text

* capture Text in html and text

* check category in Text equality check

* lower case all caps before checking for verbs

* added check for us city/state/zip

* added address type

* add address to html

* add address to text

* fix for text tests; escape for large text segments

* refactor regex for readability

* update comment

* additional test for text with linebreaks

* update docs

* update changelog

* update elements docs

* remove old comment

* case -> cast

* type fix
2023-01-26 15:52:25 +00:00
Matt Robinson
4f6fc29b54
fix: partition_html should process container divs that include text (#110)
* check for containers with text

* added tests for containers with text

* changelog and version bump
2022-12-21 21:51:04 +00:00
Matt Robinson
5f40c78f25 Initial Release 2022-09-26 14:55:20 -07:00