* add support for text2text
* add support for token classification datasets
* bump versions
* updated docs
* remove extra comment
* fix wording in docs
* fix some more wording
* Add argilla to dependencies and run pip-compile
* Implement Argilla staging brick and add unit tests
* Update version and changelog
* Update docs with description and usage for Argilla staging brick
* Remove unused fixtures and fix typo in Argilla tests
* add missing quote in docs
* changelog tweak
* doc tweaks
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* docs: add new feature to the CHANGELOG.md, bump the version, update __version__.py
* feat: new partition to call the document image analysis API
* fix: remove duplicated dependency on partition.py
* fix: linting error due to line-lenght > 100
* test: add test to call partition_pdf brick
* chore: new short example-doc pdf for speed up in test X8
* fix: add missing return statement to _read to pass check
* feat: new partitioning brick to call doc parse API
* docs: version update fix in CHANGELOG
* refactor: no nested ifs
* docs: documentation for new brick partition_pdf
* refactor: made tidy
* docs: minor doc refactor
Co-authored-by: Sebastian Laverde <sebastian@unstructured.io>
* brick to extract text before
* brick for extract text after
* tests for extract before and after
* updated docs
* changelog and bump version
* fix typo
* fix another typo
* positive -> non-negative
* added prefix and postfix cleaners
* added test for pre and postfix cleaners
* added docs for prefix and postfix bricks
* changelog and bump version
* add dev to version
* feat: Add staging brick for Datasaur token-based tasks
* Added doc string and formatting with flake8,mypy and black
* docs: Added documentation for stage_for_datasaur
* fix: version sync correction
* fix: Corrections to docs fror stage_for_datasaur
* fix: changes in naming of example variables
* Update docs/source/bricks.rst
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* Implement convert_to_isd_csv function
* Add unit tests for convert_to_isd_csv function
* Update docs with description and example of convert_to_isd_csv function
* Update changelog and version
* add huggingface dependencies and re pip-compile
* first pass on chunk by attention window
* test for chunking function
* completed tests for chunk_by_attention_window
* change default buffer size to 2
* wrapper function for staging
* added docs for transformers
* fix wording and typos
* updated change log and bumped the version
* added docs on huggingface dependencies
* fix typo
* re pip-compile
* Implement stage_for_label_box function
* Add unit tests for stage_for_label_box function
* Update docs with description and example for stage_for_label_box function
* Bump version and update CHANGELOG.md
* Fix linting issues and implement suggested changes
* Update stage_for_label_box docs with a note for uploading files to cloud providers
* Added script to check/sync versions using CHANGELOG.md as a source of truth.
* Script currently only syncs __version__.py but can easily be extended to cover other files by adding the files to an array in the script.
* Also updated sphinx conf.py to get version dynamically from __version__.py
* Allow stage_for_label_studio to take a predictions input and implement prediction class
* Update unit tests for LabelStudioPrediction and stage_for_label_studio function
* Update stage_for_label_studio docs with example of loading predictions
* Bump version and update changelog
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
* Implement save_as_jsonl and read_from_jsonl utility functions
* Add unit tests for save_as_jsonl and read_from_jsonl utility functions
* Add example of using save_as_jsonl with prodigy staging brick
* Bump version and update changelog
* remove accidentally added prodigy json file
* added "the" in jsonl description
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* added types for label studio annotations
* added method to cast as dicts
* added length check for annotations
* tweaks to get upload to work
* added validation for label types
* annotations is a list for each example
* little bit of refactoring
* test for staging with label studio
* tests for error conditions and reviewers
* added test for NER annotations
* updated changelog and bumped version
* added docs with annotation examples
* fix label studio link
* bump version in sphinx docs
* fulle -> full (typo fix)
* Refactor metadata validation and implement stage_csv_for_prodigy brick
* Refactor unit tests for metadata validation and add tests for Prodigy CSV brick
* Add stage_csv_for_prodigy description and example in docs
* Bump version and update changelog
* added _csv_ to function name
* update changelog line to 0.2.1-dev2
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
* Implement unit tests for stage_for_prodigy brick
* Implement brick for converting data to Prodigy format
* Add stage_for_prodigy description and example to docs
* updated changelog
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Added text_field and id_field to stage_for_label_studio signature, to allow user to specify the keys in the resulting JSON. Includes tests and update to example in sphinx docs.