13 Commits

Author SHA1 Message Date
qued
a75499d465
feat: local inference (#125)
Splits partition_pdf into two paths, one used for local inference when url is None, another for inference via api when url is a string.
2023-01-04 16:19:05 -06:00
Matt Robinson
17045aed80
feat: add convert_to_dataframe staging brick (#127)
* add pandas to deps; pip-compile

* staging brick to convert elements to dataframe

* bump version

* add convert_to_dataframe docs

* bump wheel version

* typo fix

* typo fix 2!
2023-01-04 12:04:59 -05:00
Matt Robinson
b14f6ac9bd
feat: extract metadata from .docx, .xlsx, and .jpg (#113)
* add python-docx dependency

* added function for extracting metadata from word documents

* add openpyxl

* added get_jpg_metadata; fixed typing

* bump changelog

* added pillow to dependencies
2022-12-26 09:34:36 -05:00
Matt Robinson
407f700b20
build(deps): bump certify to incorporate security patches (#105)
* pin certifi in base and huggingface

* pinning for build and docs
2022-12-19 14:47:15 -05:00
Matt Robinson
b1cce16c16
feat: translate_text cleaning brick (#101)
* initial implementation for translate brick

* more input validation

* tests for translate brick

* added docs

* bumped version

* chinese and arabic tests

* re-run pip-compile

* add torch to dependencies

* cleanup doc string

* fix long string

* fix typo in docs

* take out empty string check

* return string if string is empty

* added huggingface into make install
2022-12-15 15:35:15 -05:00
Matt Robinson
5c4428413a
build(deps): Bump jupyter-core library (#85) 2022-11-30 10:04:56 -05:00
asymness
2170a2aae2
feat: Implement Argilla staging brick (#81)
* Add argilla to dependencies and run pip-compile

* Implement Argilla staging brick and add unit tests

* Update version and changelog

* Update docs with description and usage for Argilla staging brick

* Remove unused fixtures and fix typo in Argilla tests

* add missing quote in docs

* changelog tweak

* doc tweaks

Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-11-28 14:41:48 +00:00
Mallori Harrell
53fcf4e912
chore: Remove PDF parsing code and dependencies (#75)
Remove PDF parsing code and dependencies.
2022-11-21 11:47:29 -06:00
Yuming Long
7c61639f23
python_require (#65) 2022-11-11 12:15:23 -05:00
Matt Robinson
2715950d6f
chore: Add long description content type; bump version (#59) 2022-11-08 16:55:41 -05:00
benjats07
6b3e86c508
docs: Added long description to PyPi (#58)
* docs: Added long description to PyPi

* Added fields for description in PyPi
2022-11-08 15:22:43 -06:00
Matt Robinson
fb16847946
feat: Staging brick for attention window chunking (#34)
* add huggingface dependencies and re pip-compile

* first pass on chunk by attention window

* test for chunking function

* completed tests for chunk_by_attention_window

* change default buffer size to 2

* wrapper function for staging

* added docs for transformers

* fix wording and typos

* updated change log and bumped the version

* added docs on huggingface dependencies

* fix typo

* re pip-compile
2022-10-13 11:18:27 -04:00
Matt Robinson
5f40c78f25 Initial Release 2022-09-26 14:55:20 -07:00