Add functionality to try other common encodings for html, xml files if an error related to the encoding is raised and the user has not specified an encoding.
Change auto.py to have a None default for encoding
Remove the unused parameter encoding from partition_pdf
Add functionality to the read_txt_file utility function to handle file-like object from URL
* first pass on partition_xml
* add option to keep xml tags
* added tests for xml
* fix filename
* update filenames
* remove outdated readme
* add xml to auto
* version and changelog
* update readme and docs
* pass through include_metadata
* update include_metadata description
* add README back in
* linting, linting, linting
* more linting
* spooled to bytes doesnt need to be a tuple
* Add tests for newly supported filetypes
* Correct metadata filetype
* doc typo
Co-authored-by: qued <64741807+qued@users.noreply.github.com>
* typo fix
Co-authored-by: qued <64741807+qued@users.noreply.github.com>
* typo fix
Co-authored-by: qued <64741807+qued@users.noreply.github.com>
* keep_xml_tags -> xml_keep_tags
---------
Co-authored-by: Alan Bertl <alan@unstructured.io>
Co-authored-by: qued <64741807+qued@users.noreply.github.com>