* Add notion connector and supporting code
* minor fixes
* Add notion deps to extras
* Use the same return type for both helper methods
* Don't ignore types that aren't recognized when mapping json
* Add support for recursively getting docs
* Add recursive search for databases
* fix logging
* fix linting
* remove debugging code
* add param
* expected test
* add option (to do doc nit)
* test with api for now
* typo
* test with api key
* use local only
* encoding -> partition-encoding
* changelog and version
* Update ingest test fixtures (#1055)
Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>
* ignore coordinates
* no witespace lol
* Update ingest test fixtures (#1061)
Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>
* add auto_paragraph_grouper. add line break pattern.
* combine group_broken_paragraph and blank_line_grouper function
* fix make check errors
* fix make check errors
* fix make check errors
* fix make check errors
* run make tidy to fix errors
* tidy core.py and text.py
* fix blank-line breaker to extends the result and replace new line with space
* fix function name typo
* call group_broken_paragraphs for blank_line_grouper
* edit function name from one_line_grouper to new_line_grouper for consistency
* edit threshold from 0.5 to 0.1
* edit threshold from 0.5 to 0.1
* Revert "call group_broken_paragraphs for blank_line_grouper"
This reverts commit 8fb93b7aa7c4d7e0320ac1e09c77da44c9b6c7d9.
* revert to commit 8fb93b7 and change threshold from 0.5 to 0.1
* edit test_text assertion. remove all BULLETS_PATTERN.
* Update ingest test fixtures (#1052)
Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>
* edit test case in test_xml_partition
* update assertion on test_auto
---------
Co-authored-by: Klaijan Sinteppadon <klaijan@Klaijans-MacBook-Pro.local>
Co-authored-by: Klaijan Sinteppadon <klaijan@klaijans-mbp.mynetworksettings.com>
Co-authored-by: Klaijan Sinteppadon <klaijan@Klaijans-MBP.fios-router.home>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>
* feat: add functionality to track emphasized text (`bold/italic` formatting) from paragraph
* chore: add docstring
* chore: fix lint errors
* feat: ignore spaces when extracting emphasized texts from a paragraph
* feat: add functionality to track emphasized text (`bold/italic` formatting) from table
* test: add test case for grabbing emphasized texts from element metadata
* chore: fix lint errors
* chore: update changelog & version
* Update ingest test fixtures (#1047)
* feat: add func for checking on EmailAddress type
* feat: add EmailAddress type
* feat: add check for email type
* feat: add test for cheking EmailAdress type
* feat: update existing example files with email
* feat: add new exampe fileds with email in the text
* fix: apply linter
* feat: update changelog file
* feat: add test for is_email_address function
* don't push
* fix: clean up code
* apply linter
* fix: clean up
* fix: remove file chaanges
* fix: remove not used files for email address test
* fix: remove not necessary tests
* clean up
* fix: apply linter
* fix: update CHANGELOG
* fix: change version
* fix: fix msg test
* fix: apply linter for tests
* fix: remove spaces
* fix: apply linter with longer line
* feat: update documentation
* fix: remove duplicates
* Update getting_started.rst
---------
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* don't push
* enhancement: improve json detection by detect_filetype (#971)
* update regex pattern
* improve json regex pattern checks and add test file
* update file name
* update tests and formatting
* update changelog and version
* refactor: simplifies JSON detection and add tests (#975)
* refactor json detection
* version and changelog
* fix mock in test
* feat: adds Outlook connector (#939)
* bonus: fixes issue with email partitioning where From field was being assigned the To field value.
* Roman/expose dpi param (#966)
* Bump inference version
* Pass through the dpi param if available
* Update CHANGELOG
* Check dpi param passed in via unit test
* Bump inference version
* Fix unit test around file info to work on mac as well
* chore: cleanup changelog for 0.8.2 (#976)
* Update `partition_via_api` to not post a strategy value if not user specified (#967)
* remove default strategy
* working on test
* fixed test, coordinates param needed to be included
* nits
* update changelog
* lint
* update requirements
* build(release): cut 0.8.4 release (#979)
* feat: add document date for remaining file types (#930) (#969)
* feat: add document date for remaining file types (#930)
* feat: add functions for getting modification date
* feat: add date field to metadata from csv file
* feat: add tests for csv patition
* feat: add date field to metadata from html file
* feat: add tests for html partition
* fix: return file name onlyif possible
* feat: add csv tests
* fix: renaming
* feat: add filed metadata_date as date of last mod
* feat: add tests for partition_docx
* feat: add filed metadata_date to .doc file
* feat: add tests for partition_doc
* feat: add metadata_date to .epub file
* feat: add tests for partition_epub
* fix: fix test mocking
* feat: add metadata_date for image partition
* feat: add test for image partition
* feat: add coorrdinate system argument
* feat: add date to element metadata
* feat: add metadata_date for JSON partition
* feat: add test for JSON partition
* fix: rename variable
* feat: add metadata_date for md partition
* feat: add test for md partition
* feat: update doc string
* feat: add metadata_date for .odt partition
* feat: update .odt string
* feat: add metadata_date for .org partition
* feat: add tests for .org partition
* feat: add metadata_date for .pdf partition
* feat: add tests for .pdf partition
* feat: add metadata_date for .pptx partition
* feat: add metadata_date for .ppt partition
* feat: add tests for .ppt partition
* feat: add tests for .pptx partition
* feat: add metadata_date for .rst partition
* feat: add tests for .rst partition
* fix: get modification date after file checking
* feat: add tests for .rtf partition
* feat: add tests for .rtf partition
* feat: add metadata_date for .txt partition
* fix: rename argument
* feat: add tests for .txt partition
* feat: update doc string rst patrition function
* feat: add metadata_date for .tsv partition
* feat: add tests for .tsv partition
* feat: add metadata_date for .xlsx partition
* feat: add tests for .xlsx partition
* fix: clean up
* feat: add tests for .xml partition
* feat: add tests for .xml partition
* fix: use `or ` instead of `if`
* fix: fix epub tests
* fix: remove not used code
* fix: add try block for getting file name
* fix: applying linter changes
* fix: fix test_partition_file
* feat: add metadata_date for email
* feat: add test for email partition
* feat: add metadata_date for msg
* feat: add tests for msg partition
* feat: update CHANGELOG file
* fix: update partitions doc string
* don't push
* fix: clean up code
* linting, linting, linting
* remove unnecessary example doc
* update version and changelog
* ingest-test-fixtures-update
* set metadata date in test
---------
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* ingest-test-fixtures-update
* Update ingest test fixtures (#970)
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* Revert "Update ingest test fixtures (#970)"
This reverts commit 1d182ae474b3545b15551fffc15977757d552cd2.
* remove date from metadata in outputs
* update docstring ordering
* remove print
* remove print
* remove print
* linting, linting, linting
* fix version and test
* fix changelog
* fix changelog
* update version
---------
Co-authored-by: kravetsmic <79907559+kravetsmic@users.noreply.github.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* Chore: add uns api repo unittests (#954)
* stage
* git clone
* ci ignore markdown file
* make install
* use env instead
* remove md
* add script
* wrong env value
* add note
* maybe don't rm
* no cd../
---------
Co-authored-by: cragwolfe <crag@unstructured.io>
* fix: handling for empty tables in word docs and powerpoints (#982)
* fix table index error
* changelog and version
* fix: only download nltk packages if necessary (#985)
* fix: only download nltk if necessary
* changelog and version
* Chore: Pass table support param to partition image (#973)
* add param and test in image table extraction
* version and changelog
* need to publish this one for api repo
* add new param skip_infer_table_types
* use warning
* clean up with mapping
* add test for tsv
* fix test fail
* weird change from merge
* doc nit
* don't use mapping
* correct conflict
* Update pip in makefile (#981)
* update pip in makefile
* merge and update requirements
* update version
* update outlook requirements
* chore: remove debug printing (#988)
* fix: correct nltk download arg order (#991)
* fix: correct download order to nltk args
* add smoke test for tokenizers
* Chore: put back function `split_by_paragraph` (#992)
* put back function
* not really fixes
* don't push
* fix: clean up code
* fix: clean up
* fix: clean up
* feat: add document date for remaining file types (#930) (#969)
* feat: add document date for remaining file types (#930)
* feat: add functions for getting modification date
* feat: add date field to metadata from csv file
* feat: add tests for csv patition
* feat: add date field to metadata from html file
* feat: add tests for html partition
* fix: return file name onlyif possible
* feat: add csv tests
* fix: renaming
* feat: add filed metadata_date as date of last mod
* feat: add tests for partition_docx
* feat: add filed metadata_date to .doc file
* feat: add tests for partition_doc
* feat: add metadata_date to .epub file
* feat: add tests for partition_epub
* fix: fix test mocking
* feat: add metadata_date for image partition
* feat: add test for image partition
* feat: add coorrdinate system argument
* feat: add date to element metadata
* feat: add metadata_date for JSON partition
* feat: add test for JSON partition
* fix: rename variable
* feat: add metadata_date for md partition
* feat: add test for md partition
* feat: update doc string
* feat: add metadata_date for .odt partition
* feat: update .odt string
* feat: add metadata_date for .org partition
* feat: add tests for .org partition
* feat: add metadata_date for .pdf partition
* feat: add tests for .pdf partition
* feat: add metadata_date for .pptx partition
* feat: add metadata_date for .ppt partition
* feat: add tests for .ppt partition
* feat: add tests for .pptx partition
* feat: add metadata_date for .rst partition
* feat: add tests for .rst partition
* fix: get modification date after file checking
* feat: add tests for .rtf partition
* feat: add tests for .rtf partition
* feat: add metadata_date for .txt partition
* fix: rename argument
* feat: add tests for .txt partition
* feat: update doc string rst patrition function
* feat: add metadata_date for .tsv partition
* feat: add tests for .tsv partition
* feat: add metadata_date for .xlsx partition
* feat: add tests for .xlsx partition
* fix: clean up
* feat: add tests for .xml partition
* feat: add tests for .xml partition
* fix: use `or ` instead of `if`
* fix: fix epub tests
* fix: remove not used code
* fix: add try block for getting file name
* fix: applying linter changes
* fix: fix test_partition_file
* feat: add metadata_date for email
* feat: add test for email partition
* feat: add metadata_date for msg
* feat: add tests for msg partition
* feat: update CHANGELOG file
* fix: update partitions doc string
* don't push
* fix: clean up code
* linting, linting, linting
* remove unnecessary example doc
* update version and changelog
* ingest-test-fixtures-update
* set metadata date in test
---------
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* ingest-test-fixtures-update
* Update ingest test fixtures (#970)
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* Revert "Update ingest test fixtures (#970)"
This reverts commit 1d182ae474b3545b15551fffc15977757d552cd2.
* remove date from metadata in outputs
* update docstring ordering
* remove print
* remove print
* remove print
* linting, linting, linting
* fix version and test
* fix changelog
* fix changelog
* update version
---------
Co-authored-by: kravetsmic <79907559+kravetsmic@users.noreply.github.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* Roman/ingest refactor (#978)
* Pull out s3 code as subcommand
* Pull out dropbox code as subcommand
* Pull out azure code as subcommand
* Pull out fsspec code as subcommand
* Pull out github code as subcommand
* Pull out gitlab code as subcommand
* Pull out reddit code as subcommand
* Pull out slack code as subcommand
* Pull out discord code as subcommand
* Pull out wikipedia code as subcommand
* Pull out gdrive code as subcommand
* Pull out biomed code as subcommand
* rename parameters
* Pull out onedrive code as subcommand
* Pull out outlook code as subcommand
* Pull out local code as subcommand
* Pull out elasticsearch code as subcommand
* Pull out confluence code as subcommand
* Drop previous main file
* update changelog
* Add back in mp.Pool
* Fix mypy issues with click
* Make sure all tests run with verbose flag
* refactor approach to dynamically add common options to each subcommand, scrub logging of options for sensitive data
* Pull out some more shared options
* Support running code via python as well as cli
* update ingest readme and move it to the ingest folder
* update usage in connector docs
* move local command arg in test
* Seperate out cli code from logic running unstructured
* Make some cli fields required rather than optional
* rename process -> processor
* Improve logger to avoid duplicate handlers
---------
Co-authored-by: Ryan Nikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
* feat: adds Box connector (#996)
* chore: rename Element's "date" field to "last_modified" (#997)
Change the Element's date field name to the more specific last_modified so there is less room for confusion of what that field represents.
* don't push
* feat: add document date for remaining file types (#930) (#969)
* feat: add document date for remaining file types (#930)
* feat: add functions for getting modification date
* feat: add date field to metadata from csv file
* feat: add tests for csv patition
* feat: add date field to metadata from html file
* feat: add tests for html partition
* fix: return file name onlyif possible
* feat: add csv tests
* fix: renaming
* feat: add filed metadata_date as date of last mod
* feat: add tests for partition_docx
* feat: add filed metadata_date to .doc file
* feat: add tests for partition_doc
* feat: add metadata_date to .epub file
* feat: add tests for partition_epub
* fix: fix test mocking
* feat: add metadata_date for image partition
* feat: add test for image partition
* feat: add coorrdinate system argument
* feat: add date to element metadata
* feat: add metadata_date for JSON partition
* feat: add test for JSON partition
* fix: rename variable
* feat: add metadata_date for md partition
* feat: add test for md partition
* feat: update doc string
* feat: add metadata_date for .odt partition
* feat: update .odt string
* feat: add metadata_date for .org partition
* feat: add tests for .org partition
* feat: add metadata_date for .pdf partition
* feat: add tests for .pdf partition
* feat: add metadata_date for .pptx partition
* feat: add metadata_date for .ppt partition
* feat: add tests for .ppt partition
* feat: add tests for .pptx partition
* feat: add metadata_date for .rst partition
* feat: add tests for .rst partition
* fix: get modification date after file checking
* feat: add tests for .rtf partition
* feat: add tests for .rtf partition
* feat: add metadata_date for .txt partition
* fix: rename argument
* feat: add tests for .txt partition
* feat: update doc string rst patrition function
* feat: add metadata_date for .tsv partition
* feat: add tests for .tsv partition
* feat: add metadata_date for .xlsx partition
* feat: add tests for .xlsx partition
* fix: clean up
* feat: add tests for .xml partition
* feat: add tests for .xml partition
* fix: use `or ` instead of `if`
* fix: fix epub tests
* fix: remove not used code
* fix: add try block for getting file name
* fix: applying linter changes
* fix: fix test_partition_file
* feat: add metadata_date for email
* feat: add test for email partition
* feat: add metadata_date for msg
* feat: add tests for msg partition
* feat: update CHANGELOG file
* fix: update partitions doc string
* don't push
* fix: clean up code
* linting, linting, linting
* remove unnecessary example doc
* update version and changelog
* ingest-test-fixtures-update
* set metadata date in test
---------
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* ingest-test-fixtures-update
* Update ingest test fixtures (#970)
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* Revert "Update ingest test fixtures (#970)"
This reverts commit 1d182ae474b3545b15551fffc15977757d552cd2.
* remove date from metadata in outputs
* update docstring ordering
* remove print
* remove print
* remove print
* linting, linting, linting
* fix version and test
* fix changelog
* fix changelog
* update version
---------
Co-authored-by: kravetsmic <79907559+kravetsmic@users.noreply.github.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* feat: add document date for remaining file types (#930) (#969)
* feat: add document date for remaining file types (#930)
* feat: add functions for getting modification date
* feat: add date field to metadata from csv file
* feat: add tests for csv patition
* feat: add date field to metadata from html file
* feat: add tests for html partition
* fix: return file name onlyif possible
* feat: add csv tests
* fix: renaming
* feat: add filed metadata_date as date of last mod
* feat: add tests for partition_docx
* feat: add filed metadata_date to .doc file
* feat: add tests for partition_doc
* feat: add metadata_date to .epub file
* feat: add tests for partition_epub
* fix: fix test mocking
* feat: add metadata_date for image partition
* feat: add test for image partition
* feat: add coorrdinate system argument
* feat: add date to element metadata
* feat: add metadata_date for JSON partition
* feat: add test for JSON partition
* fix: rename variable
* feat: add metadata_date for md partition
* feat: add test for md partition
* feat: update doc string
* feat: add metadata_date for .odt partition
* feat: update .odt string
* feat: add metadata_date for .org partition
* feat: add tests for .org partition
* feat: add metadata_date for .pdf partition
* feat: add tests for .pdf partition
* feat: add metadata_date for .pptx partition
* feat: add metadata_date for .ppt partition
* feat: add tests for .ppt partition
* feat: add tests for .pptx partition
* feat: add metadata_date for .rst partition
* feat: add tests for .rst partition
* fix: get modification date after file checking
* feat: add tests for .rtf partition
* feat: add tests for .rtf partition
* feat: add metadata_date for .txt partition
* fix: rename argument
* feat: add tests for .txt partition
* feat: update doc string rst patrition function
* feat: add metadata_date for .tsv partition
* feat: add tests for .tsv partition
* feat: add metadata_date for .xlsx partition
* feat: add tests for .xlsx partition
* fix: clean up
* feat: add tests for .xml partition
* feat: add tests for .xml partition
* fix: use `or ` instead of `if`
* fix: fix epub tests
* fix: remove not used code
* fix: add try block for getting file name
* fix: applying linter changes
* fix: fix test_partition_file
* feat: add metadata_date for email
* feat: add test for email partition
* feat: add metadata_date for msg
* feat: add tests for msg partition
* feat: update CHANGELOG file
* fix: update partitions doc string
* don't push
* fix: clean up code
* linting, linting, linting
* remove unnecessary example doc
* update version and changelog
* ingest-test-fixtures-update
* set metadata date in test
---------
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* ingest-test-fixtures-update
* Update ingest test fixtures (#970)
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* Revert "Update ingest test fixtures (#970)"
This reverts commit 1d182ae474b3545b15551fffc15977757d552cd2.
* remove date from metadata in outputs
* update docstring ordering
* remove print
* remove print
* remove print
* linting, linting, linting
* fix version and test
* fix changelog
* fix changelog
* update version
---------
Co-authored-by: kravetsmic <79907559+kravetsmic@users.noreply.github.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
* fix: removie prints
* remove unused file
* fix: apply linter
* feat: add post processing filter_element_types
* feat: add tests for filter_element_types
* feat: update changelog
* feat: add doc string for filter_element_types
* fix: change the version
* feat: update documentation
* bump dev version number
* cleanup changelog
* linting, linting, linting
---------
Co-authored-by: John <43506685+Coniferish@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
Co-authored-by: David Potter <potterdavidm@gmail.com>
Co-authored-by: Roman Isecke <136338424+rbiseck3@users.noreply.github.com>
Co-authored-by: shreyanid <42684285+shreyanid@users.noreply.github.com>
Co-authored-by: cragwolfe <crag@unstructured.io>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: MthwRobinson <MthwRobinson@users.noreply.github.com>
Co-authored-by: Yuming Long <63475068+yuming-long@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* chore: CFR double column sample
Federal Regulations document for example notebook in `examples/custom-layout-order`
* chore: custom-layout-order example dir
* feat: helper methods to plot and reorder layouts
Helper methods: `plot_image_with_bounding_boxes_coloured` and `reorder_elements_in_double_columns`
* chore: delete __init__.py
---------
Co-authored-by: Benjamin Torres <benjats07@users.noreply.github.com>
* split dependencies by document type
* make pip-compile with new requirements
* add extra requirements to setup.py
* add in all docs; re pip-compile
* extra for all docs
* add pandas to xlsx
* dependency requires for tsv and csv
* handling for doc, docx and odt
* dependency check for pypandoc
* required dependencies for pandoc files
* xml and html
* markdown
* msg
* add in pdf
* add in pptx
* add in excel
* add lxml as base req
* extra all docs for local inference
* local inference installs all
* pin pillow version
* fixes for plain text tests
* fixes for doc
* update make commands
* changelog and version
* add xlrd
* update pip-compile
* pin numpy for python 3.8 support
* more constraints
* contraint on scipy
* update install docs
* constrain ipython
* add outlook to pip-compile
* more ipython constraints
* add extras to dockerfile
* pin office365 client
* few doc tweaks
* types as strings
* last pip-compile
* re pip-comple
* make tidy
* make tidy
* Pull out s3 code as subcommand
* Pull out dropbox code as subcommand
* Pull out azure code as subcommand
* Pull out fsspec code as subcommand
* Pull out github code as subcommand
* Pull out gitlab code as subcommand
* Pull out reddit code as subcommand
* Pull out slack code as subcommand
* Pull out discord code as subcommand
* Pull out wikipedia code as subcommand
* Pull out gdrive code as subcommand
* Pull out biomed code as subcommand
* rename parameters
* Pull out onedrive code as subcommand
* Pull out outlook code as subcommand
* Pull out local code as subcommand
* Pull out elasticsearch code as subcommand
* Pull out confluence code as subcommand
* Drop previous main file
* update changelog
* Add back in mp.Pool
* Fix mypy issues with click
* Make sure all tests run with verbose flag
* refactor approach to dynamically add common options to each subcommand, scrub logging of options for sensitive data
* Pull out some more shared options
* Support running code via python as well as cli
* update ingest readme and move it to the ingest folder
* update usage in connector docs
* move local command arg in test
* Seperate out cli code from logic running unstructured
* Make some cli fields required rather than optional
* rename process -> processor
* Improve logger to avoid duplicate handlers
---------
Co-authored-by: Ryan Nikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
* add param and test in image table extraction
* version and changelog
* need to publish this one for api repo
* add new param skip_infer_table_types
* use warning
* clean up with mapping
* add test for tsv
* fix test fail
* weird change from merge
* doc nit
* don't use mapping
* correct conflict
* remove default strategy
* working on test
* fixed test, coordinates param needed to be included
* nits
* update changelog
* lint
* update requirements
* Bump inference version
* Pass through the dpi param if available
* Update CHANGELOG
* Check dpi param passed in via unit test
* Bump inference version
* Fix unit test around file info to work on mac as well
* update regex pattern
* improve json regex pattern checks and add test file
* update file name
* update tests and formatting
* update changelog and version
This PR is to reflect changes in the unstructured-inference PR #152
* Update functionality to retrieve image metadata from a page for document_to_element_list
* track tags in html
* pass through links as metadata
* add test for grabbing links
* one more link
* changelog and version
* update docs
* fix tests
* update empty link assertion
* ingest-test-fixtures-update
* Update ingest test fixtures (#961)
* add min_partition
* functioning _split_content_to_fit_min_max
* create test and make tidy/check
* fix rebase issues
* fix type hinting, remove unused code, add tests
* various changes and refactoring of methods
* add test, refactor, change var names for debugging purposes
* update test
* make tidy/check
* give more descriptive var names and add comments
* update xml partition via partition_text and create test
* fix <pre> bug for test_partition_html_with_pre_tag
* make tidy
* refactor and fix tests
* make tidy/check
* ingest-test-fixtures-update
* change list comprehension to for loop
* fix error check
There's an issue in unstructured-inference about these blocks trapping unrelated import errors. The fix for that would be to narrow the scope of the traps, but I think this is made redundant by the requires_dependencies decorator, so I removed it completely.