unstructured

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-10-02 11:52:25 +00:00

Author	SHA1	Message	Date
Matt Robinson	77cd5cc01f	feat: text2text and token classification for argilla (#87 ) * add support for text2text * add support for token classification datasets * bump versions * updated docs * remove extra comment * fix wording in docs * fix some more wording	2022-11-30 20:07:42 +00:00
asymness	2170a2aae2	feat: Implement Argilla staging brick (#81 ) * Add argilla to dependencies and run pip-compile * Implement Argilla staging brick and add unit tests * Update version and changelog * Update docs with description and usage for Argilla staging brick * Remove unused fixtures and fix typo in Argilla tests * add missing quote in docs * changelog tweak * doc tweaks Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io> Co-authored-by: Matt Robinson <mrobinson@unstructured.io>	2022-11-28 14:41:48 +00:00
Matt Robinson	b041b0197d	feat: Add entities kwarg to datasaur bricks (#77 ) * added entities to datasaur * add tests for datasaur with entities * update docs * fix missing imports * bump version * remove accidental file	2022-11-22 19:50:19 +00:00
Matt Robinson	08e091c5a9	chore: Reorganize partition bricks under partition directory (#76 ) * move partition_pdf to partition folder * move partition.py * refactor partioning bricks into partition diretory * import to nlp for backward compatibility * update docs * update version and bump changelog * fix typo in changelog * update readme reference	2022-11-21 22:27:23 +00:00
Mallori Harrell	53fcf4e912	chore: Remove PDF parsing code and dependencies (#75 ) Remove PDF parsing code and dependencies.	2022-11-21 11:47:29 -06:00
Sebastian Laverde Alfonso	baa15d0098	feat: new partitioning brick that calls the document image analysis API (#68 ) * docs: add new feature to the CHANGELOG.md, bump the version, update __version__.py * feat: new partition to call the document image analysis API * fix: remove duplicated dependency on partition.py * fix: linting error due to line-lenght > 100 * test: add test to call partition_pdf brick * chore: new short example-doc pdf for speed up in test X8 * fix: add missing return statement to _read to pass check * feat: new partitioning brick to call doc parse API * docs: version update fix in CHANGELOG * refactor: no nested ifs * docs: documentation for new brick partition_pdf * refactor: made tidy * docs: minor doc refactor Co-authored-by: Sebastian Laverde <sebastian@unstructured.io>	2022-11-16 17:48:30 +01:00
Matt Robinson	300c564c62	feat: Cleaning bricks to extract text before/after a pattern (#63 ) * brick to extract text before * brick for extract text after * tests for extract before and after * updated docs * changelog and bump version * fix typo * fix another typo * positive -> non-negative	2022-11-10 21:35:37 +00:00
Matt Robinson	f3756abc90	feat: Cleaning bricks for removing prefixes and postfixes (#62 ) * added prefix and postfix cleaners * added test for pre and postfix cleaners * added docs for prefix and postfix bricks * changelog and bump version * add dev to version	2022-11-10 12:24:58 -05:00
benjats07	df16b5806b	feat: Add staging brick for Datasaur token-based tasks (#50 ) * feat: Add staging brick for Datasaur token-based tasks * Added doc string and formatting with flake8,mypy and black * docs: Added documentation for stage_for_datasaur * fix: version sync correction * fix: Corrections to docs fror stage_for_datasaur * fix: changes in naming of example variables * Update docs/source/bricks.rst Co-authored-by: Matt Robinson <mrobinson@unstructured.io>	2022-11-07 14:56:02 -06:00
Matt Robinson	de31df51a9	feat: Adds a helper function to convert ISD dicts to elements (#39 ) * updated category name for ListItem * added brick to convert isd to elements * bump version * added isd_to_elements to documentation	2022-10-21 18:43:10 +00:00
asymness	2d5dba0ddc	feat: Implement staging brick for ISD CSV format (#36 ) * Implement convert_to_isd_csv function * Add unit tests for convert_to_isd_csv function * Update docs with description and example of convert_to_isd_csv function * Update changelog and version	2022-10-13 11:35:46 -04:00
Matt Robinson	fb16847946	feat: Staging brick for attention window chunking (#34 ) * add huggingface dependencies and re pip-compile * first pass on chunk by attention window * test for chunking function * completed tests for chunk_by_attention_window * change default buffer size to 2 * wrapper function for staging * added docs for transformers * fix wording and typos * updated change log and bumped the version * added docs on huggingface dependencies * fix typo * re pip-compile	2022-10-13 11:18:27 -04:00
asymness	ec5be8e8b0	feat: Implement LabelBox staging brick (#26 ) * Implement stage_for_label_box function * Add unit tests for stage_for_label_box function * Update docs with description and example for stage_for_label_box function * Bump version and update CHANGELOG.md * Fix linting issues and implement suggested changes * Update stage_for_label_box docs with a note for uploading files to cloud providers	2022-10-11 10:15:25 -04:00
qued	1d3076a4b2	feat: keep version synchronized (#25 ) * Added script to check/sync versions using CHANGELOG.md as a source of truth. * Script currently only syncs __version__.py but can easily be extended to cover other files by adding the files to an array in the script. * Also updated sphinx conf.py to get version dynamically from __version__.py	2022-10-10 13:11:48 -05:00
Matt Robinson	836f156582	docs: Add example LabelStudio sentiment analysis example (#24 ) * added documentation on how to use unstructured with labelstudio * hard code risk narrative for docs * link to create project call	2022-10-10 08:27:01 -04:00
asymness	baba641d03	feat: Allow option to specify predictions in LabelStudio staging brick (#23 ) * Allow stage_for_label_studio to take a predictions input and implement prediction class * Update unit tests for LabelStudioPrediction and stage_for_label_studio function * Update stage_for_label_studio docs with example of loading predictions * Bump version and update changelog Co-authored-by: Matt Robinson <mrobinson@unstructured.io>	2022-10-06 13:35:55 +00:00
Yuming Long	779e48bafe	chore: Integration test to show LabelStudio brick working with SDK (#21 )	2022-10-05 14:38:44 -04:00
asymness	28a4ae985d	feat: Implement utility functions for reading and writing `.jsonl` files (#22 ) * Implement save_as_jsonl and read_from_jsonl utility functions * Add unit tests for save_as_jsonl and read_from_jsonl utility functions * Add example of using save_as_jsonl with prodigy staging brick * Bump version and update changelog * remove accidentally added prodigy json file * added "the" in jsonl description Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>	2022-10-04 09:51:11 -04:00
Matt Robinson	a950559b94	feat: Optionally include LabelStudio annotations in staging brick (#19 ) * added types for label studio annotations * added method to cast as dicts * added length check for annotations * tweaks to get upload to work * added validation for label types * annotations is a list for each example * little bit of refactoring * test for staging with label studio * tests for error conditions and reviewers * added test for NER annotations * updated changelog and bumped version * added docs with annotation examples * fix label studio link * bump version in sphinx docs * fulle -> full (typo fix)	2022-10-04 13:25:05 +00:00
asymness	d429e9b305	feat: Implement `stage_csv_for_prodigy` brick (#13 ) * Refactor metadata validation and implement stage_csv_for_prodigy brick * Refactor unit tests for metadata validation and add tests for Prodigy CSV brick * Add stage_csv_for_prodigy description and example in docs * Bump version and update changelog * added _csv_ to function name * update changelog line to 0.2.1-dev2 Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>	2022-10-03 09:30:30 -04:00
asymness	35d488a466	feat: Implement stage_for_prodigy brick (#11 ) * Implement unit tests for stage_for_prodigy brick * Implement brick for converting data to Prodigy format * Add stage_for_prodigy description and example to docs * updated changelog Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>	2022-09-30 12:41:37 -04:00
qued	64e1c725eb	feat: Add text_field and id_field to stage_for_label_studio signature (#9 ) Added text_field and id_field to stage_for_label_studio signature, to allow user to specify the keys in the resulting JSON. Includes tests and update to example in sphinx docs.	2022-09-28 09:30:17 -05:00
Matt Robinson	5f40c78f25	Initial Release	2022-09-26 14:55:20 -07:00

1 2

73 Commits