6 Commits

Author SHA1 Message Date
Matt Robinson
a800967478
enhancements: add page numbers for word docs when available (#750)
* add support for page numbers in docx when present

* version and changelog

* add comment on page numbers

* add header and footer to doc elements list

* update integrations docs

* include_page_breaks kwarg for doc and docx

* merge element metadata for pagebreaks

* fix typo

* fix changelog typo

* change page number default to None

* add initial_page_number kwarg

* make page number tests in pdf more explicit

* revert test file

* update ingest tests

* update test fixture outputs

* updates to IRS forms fixtures

* ingest-test-fixtures-update

* Update ingest test fixtures (#759)

Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>

---------

Co-authored-by: Unstructured-DevOps <111007769+Unstructured-DevOps@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
2023-06-15 12:21:17 -04:00
Matt Robinson
c35fff2972
feat: Add stage_for_weaviate and schema creation function (#672)
* add weaviate docker compose

* added staging brick and tests for weaviate

* initial notebook and requirements file

* add commentary to weaviate notebook

* weaviate readme

* update docs

* version and change log

* install weaviate client

* install weaviate; skip for docker

* linting, linting, linting

* install weaviate client with deps

* comments on weaviate client

* fix module not found error for docker container

* skipped wrong test in docker

* fix typos

* add in local-inference
2023-06-01 20:48:54 +00:00
Matt Robinson
981805e435
feat: stage_for_baseplate function (#546)
* added a staging brick for baseplate

* added a test for baseplate

* update documentation

* version and changelog
2023-05-04 11:05:38 -04:00
Matt Robinson
75cf233702
feat: add partition_msg for MSFT Outlook files (#412)
* added msg-parser dependency

* pass through kwargs in convert_file_to_text

* added partition_msg for processing msft outlook files

* version bump and changelog

* added tests for partition_msg

* added test for msg with plain text

* add partition_msg docs; fix underlines in integration docs

* add .msg to file list

* finish tests for auto msg

* linting, linting, linting
2023-03-28 20:15:22 +00:00
Sebastian Laverde Alfonso
c9c1b843d2
docs: Integrations LangChain code fix (#378) 2023-03-17 22:59:22 +01:00
Sebastian Laverde Alfonso
b2f37c3eff
Docs: add Integrations section (#372)
* docs: update index, add integrations

* docs: fix typos

* docs: create integrations.rst section structure

* docs: descriptions and use for 8 integrations

* refactor: SEC example in Label Studio section

* Apply suggestions from code review

Co-authored-by: qued <64741807+qued@users.noreply.github.com>

* docs: change links order and refactor|paraphrase

---------

Co-authored-by: qued <64741807+qued@users.noreply.github.com>
2023-03-17 19:11:38 +00:00