Matt Robinson
|
f4ddf53590
|
feat: track emphasized text in partition_html (#1034)
* Feat/965 track emphasized text html (#1021)
* feat: add functionality to track emphasized text (<strong>, <em>, <span>, <b>, <i> tags) in HTML
* feat: add `include_tail_text` parameter to `_construct_text`
* test: add test case for `_get_emphasized_texts_from_tag`
* test: add `emphasized_texts` to metadata
* chore: update changelog & version
* fix tests
* fix lint errors
* chore: update changelog
* chore: small comment updates
* feat: update `XMLDocument._read_xml` to create `<p>` tag element for the text enclosed in the `<pre>` tag
* chore: update changelog
* Update ingest test fixtures (#1026)
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* ingest-test-fixtures-update
* Update ingest test fixtures (#1035)
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
---------
Co-authored-by: Christine Straub <christinemstraub@gmail.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
|
2023-08-03 16:24:25 +00:00 |
|