John
9f7bd6127b
enhancement: Add include_header
kwarg for xlsx, default True( #1125 )
...
Closes Github issue #1121
Adds include_header kwarg to partition_xlsx and change default behavior to True.
2023-08-17 04:16:23 +00:00
Christine Straub
0e887cc36b
Feat/1060 update metadata fields ( #1099 )
...
Closes Github Issue #1060 .
* update the metadata field links
* update the metadata field emphasized_texts
2023-08-16 04:33:06 +00:00
Christine Straub
b76d2ee745
feat: track emphasized text msword ( #1048 )
...
* feat: add functionality to track emphasized text (`bold/italic` formatting) from paragraph
* chore: add docstring
* chore: fix lint errors
* feat: ignore spaces when extracting emphasized texts from a paragraph
* feat: add functionality to track emphasized text (`bold/italic` formatting) from table
* test: add test case for grabbing emphasized texts from element metadata
* chore: fix lint errors
* chore: update changelog & version
* Update ingest test fixtures (#1047 )
2023-08-04 17:04:12 -04:00
Matt Robinson
f4ddf53590
feat: track emphasized text in partition_html
( #1034 )
...
* Feat/965 track emphasized text html (#1021 )
* feat: add functionality to track emphasized text (<strong>, <em>, <span>, <b>, <i> tags) in HTML
* feat: add `include_tail_text` parameter to `_construct_text`
* test: add test case for `_get_emphasized_texts_from_tag`
* test: add `emphasized_texts` to metadata
* chore: update changelog & version
* fix tests
* fix lint errors
* chore: update changelog
* chore: small comment updates
* feat: update `XMLDocument._read_xml` to create `<p>` tag element for the text enclosed in the `<pre>` tag
* chore: update changelog
* Update ingest test fixtures (#1026 )
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* ingest-test-fixtures-update
* Update ingest test fixtures (#1035 )
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
---------
Co-authored-by: Christine Straub <christinemstraub@gmail.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
2023-08-03 16:24:25 +00:00
Matt Robinson
6e852cbe70
feat: track links from anchor tags in partition_html
( #959 )
...
* track tags in html
* pass through links as metadata
* add test for grabbing links
* one more link
* changelog and version
* update docs
* fix tests
* update empty link assertion
* ingest-test-fixtures-update
* Update ingest test fixtures (#961 )
2023-07-24 18:28:56 +00:00
David Potter
3b472cb7df
feat: add google cloud storage connector ( #746 )
2023-06-21 15:14:50 -07:00