* added python-pptx to requirements
* added filetype detection for powerpoint
* add more filetypes to detect
* more tests
* added tests for filetype
* reorder document types
* tests for get_directory_file_info
* added docs for get_directory_file_info
* bump version
* Word -> Office
* added test for filetype
* add group by filetype example
* add python-magic
* first pass on filetype detection
* tests for filetype detection
* more tests for file detection
* added tests for error conditions
* install libmagic dev in github
* libmagic install instructions
* pattern for checking email files
* support reading .eml in rb mode
* add auto partition function
* auto tests for emal
* auto tests for docx
* added tests for html
* add pdf and html tests
* linting, linting, linting
* added docs for auto partitioning
* update readme with generic partition brick
* bumped version
* added test for bad type
* detect .docx files from application/octet-stream
* linting, linting, linting
* identify xlsx from octet stream
* install poppler in ci
* fix mocks; test for unknown type
* install poppler utils
* install in one line
* only poppler-utils
* file extension logic from application/octet-stream
* install local inference for ci
* install detectron2
* removing unused dockerfile
* initial implementation for translate brick
* more input validation
* tests for translate brick
* added docs
* bumped version
* chinese and arabic tests
* re-run pip-compile
* add torch to dependencies
* cleanup doc string
* fix long string
* fix typo in docs
* take out empty string check
* return string if string is empty
* added huggingface into make install
* Add argilla to dependencies and run pip-compile
* Implement Argilla staging brick and add unit tests
* Update version and changelog
* Update docs with description and usage for Argilla staging brick
* Remove unused fixtures and fix typo in Argilla tests
* add missing quote in docs
* changelog tweak
* doc tweaks
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* add huggingface dependencies and re pip-compile
* first pass on chunk by attention window
* test for chunking function
* completed tests for chunk_by_attention_window
* change default buffer size to 2
* wrapper function for staging
* added docs for transformers
* fix wording and typos
* updated change log and bumped the version
* added docs on huggingface dependencies
* fix typo
* re pip-compile