9 Commits

Author SHA1 Message Date
John
dc6d7d7268
feat: add metadata_filename parameter across all partition functions (#811)
* fix conflicts

* add tests and clean metadata_filename in partitions

* fix test_email and remove comments

* make tidy/check

* update changelog and version

* fix tests

* make tidy again
2023-07-05 16:02:22 -04:00
Emily Chen
24ebd0fa4e
chore: Move coordinate details from Element model to a metadata model (#827) 2023-07-05 11:25:11 -07:00
John
e9fdbb0943
feat: add include_metadata across all partition functions (#853)
* add include_metadata kwarg and tests to parsers

add exclude_metadata to docx

add test for doc to exclude metadata

add include_metadata kwarg to email

add include_metadata kwarg to epub

add include_metadata kwarg to json

add exclude_metadata tests to md

add include_metadata kwarg and tests for msg parse

add include_metadata kwarg and tests for odt parse

add include_metadata kwarg and tests for org parse

add include_metadata kwarg and tests for ppt and pptx parse

add include_metadata kwarg and tests for rst parse

add include_metadata kwarg and tests for rtf parse

add include_metadata tests for text parse

add include_metadata tests for tsv parse

add include_metadata tests for xlsx parse

add include_metadata tests for xml parse

* WIP add include_metadata to partition_pdf

* add include_metadata tests to partition_pdf

* make tidy/check

* update changelog and version

* change test asserts and move docstring logic to process_metadata

* make tidy

* fix tests asserts

* linting, linting, linting

* sync versions

* skip api call test not on main

---------

Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
2023-06-30 10:44:46 -04:00
Christine Straub
743482b6d3
Bug/635 unicode decode error eml (#739)
* Adds functionality to extract charset info from eml files
* Adds missed file-like object handling in detect_file_encoding
* Adds functionality to replace the MIME encodings for eml files with one of the
   common encodings if a unicode error occurs
* Organize the eml example files in the example-docs/eml directory
2023-06-17 00:52:13 +00:00
cragwolfe
3972c80c51
build(deps): bump requirements (#414) 2023-04-05 02:59:06 +00:00
Matt Robinson
414883455b
fix: correct order of kwargs in pandoc (#421)
* fix: correct order of kwargs in pandoc

* only skip epub tests in Docker

* changelog

---------

Co-authored-by: Crag Wolfe <crag@unstructuredai.io>
Co-authored-by: cragwolfe <crag@unstructured.io>
2023-03-30 20:54:29 +00:00
Matt Robinson
09b52b4fc4
fix: text kwargs no longer fail with empty string (#413)
* fix: text kwargs no longer fail with empty string

* linting
2023-03-28 21:03:51 +00:00
Amanda Cameron
a9da858fa3
chore: add tests for docker (#373) 2023-03-21 13:46:09 -07:00
natygyoon
6be07a5260
feat: update auto.partition() function to recognize Unstructured json (#337) 2023-03-08 10:36:01 -08:00