1739 Commits

Author SHA1 Message Date
Matt Robinson
de31df51a9
feat: Adds a helper function to convert ISD dicts to elements (#39)
* updated category name for ListItem

* added brick to convert isd to elements

* bump version

* added isd_to_elements to documentation
0.2.1
2022-10-21 18:43:10 +00:00
dependabot[bot]
2871941a80
build(deps): Bump sphinx from 5.2.3 to 5.3.0 in /requirements (#37)
Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 5.2.3 to 5.3.0.
- [Release notes](https://github.com/sphinx-doc/sphinx/releases)
- [Changelog](https://github.com/sphinx-doc/sphinx/blob/master/CHANGES)
- [Commits](https://github.com/sphinx-doc/sphinx/compare/v5.2.3...v5.3.0)

---
updated-dependencies:
- dependency-name: sphinx
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-17 17:27:36 +00:00
dependabot[bot]
eba4e8b144
build(deps): Bump numpy from 1.23.3 to 1.23.4 in /requirements (#38)
Bumps [numpy](https://github.com/numpy/numpy) from 1.23.3 to 1.23.4.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/RELEASE_WALKTHROUGH.rst)
- [Commits](https://github.com/numpy/numpy/compare/v1.23.3...v1.23.4)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-17 17:19:44 +00:00
Matt Robinson
704d6e11d1
chore: Update PDFDocument to use from_file method (#35)
* update PDFDocument to use from_file method

* bump version
2022-10-13 16:04:30 +00:00
asymness
2d5dba0ddc
feat: Implement staging brick for ISD CSV format (#36)
* Implement convert_to_isd_csv function

* Add unit tests for convert_to_isd_csv function

* Update docs with description and example of convert_to_isd_csv function

* Update changelog and version
2022-10-13 11:35:46 -04:00
Matt Robinson
fb16847946
feat: Staging brick for attention window chunking (#34)
* add huggingface dependencies and re pip-compile

* first pass on chunk by attention window

* test for chunking function

* completed tests for chunk_by_attention_window

* change default buffer size to 2

* wrapper function for staging

* added docs for transformers

* fix wording and typos

* updated change log and bumped the version

* added docs on huggingface dependencies

* fix typo

* re pip-compile
2022-10-13 11:18:27 -04:00
asymness
ec5be8e8b0
feat: Implement LabelBox staging brick (#26)
* Implement stage_for_label_box function

* Add unit tests for stage_for_label_box function

* Update docs with description and example for stage_for_label_box function

* Bump version and update CHANGELOG.md

* Fix linting issues and implement suggested changes

* Update stage_for_label_box docs with a note for uploading files to cloud providers
2022-10-11 10:15:25 -04:00
dependabot[bot]
546865fd64
build(deps): Bump scipy from 1.9.1 to 1.9.2 in /requirements (#32)
Bumps [scipy](https://github.com/scipy/scipy) from 1.9.1 to 1.9.2.
- [Release notes](https://github.com/scipy/scipy/releases)
- [Commits](https://github.com/scipy/scipy/compare/v1.9.1...v1.9.2)

---
updated-dependencies:
- dependency-name: scipy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: qued <64741807+qued@users.noreply.github.com>
2022-10-10 16:15:26 -05:00
dependabot[bot]
5ae06e9026
build(deps): Bump matplotlib from 3.6.0 to 3.6.1 in /requirements (#33)
Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.6.0 to 3.6.1.
- [Release notes](https://github.com/matplotlib/matplotlib/releases)
- [Commits](https://github.com/matplotlib/matplotlib/compare/v3.6.0...v3.6.1)

---
updated-dependencies:
- dependency-name: matplotlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: qued <64741807+qued@users.noreply.github.com>
2022-10-10 15:10:50 -05:00
qued
1d3076a4b2
feat: keep version synchronized (#25)
* Added script to check/sync versions using CHANGELOG.md as a source of truth.
* Script currently only syncs __version__.py but can easily be extended to cover other files by adding the files to an array in the script.
* Also updated sphinx conf.py to get version dynamically from __version__.py
2022-10-10 13:11:48 -05:00
dependabot[bot]
6d16caafcd
build(deps): Bump pytz from 2022.2.1 to 2022.4 in /requirements (#31)
Bumps [pytz](https://github.com/stub42/pytz) from 2022.2.1 to 2022.4.
- [Release notes](https://github.com/stub42/pytz/releases)
- [Commits](https://github.com/stub42/pytz/compare/release_2022.2.1...release_2022.4)

---
updated-dependencies:
- dependency-name: pytz
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-10 17:39:32 +00:00
dependabot[bot]
8a1ec69359
build(deps-dev): Bump pip-tools from 6.8.0 to 6.9.0 in /requirements (#30)
Bumps [pip-tools](https://github.com/jazzband/pip-tools) from 6.8.0 to 6.9.0.
- [Release notes](https://github.com/jazzband/pip-tools/releases)
- [Changelog](https://github.com/jazzband/pip-tools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jazzband/pip-tools/compare/6.8.0...6.9.0)

---
updated-dependencies:
- dependency-name: pip-tools
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-10 17:33:55 +00:00
dependabot[bot]
7664bbfa17
build(deps): Bump timm from 0.6.7 to 0.6.11 in /requirements (#29)
Bumps [timm](https://github.com/rwightman/pytorch-image-models) from 0.6.7 to 0.6.11.
- [Release notes](https://github.com/rwightman/pytorch-image-models/releases)
- [Changelog](https://github.com/rwightman/pytorch-image-models/blob/master/docs/changes.md)
- [Commits](https://github.com/rwightman/pytorch-image-models/compare/v0.6.7...v0.6.11)

---
updated-dependencies:
- dependency-name: timm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-10 17:28:28 +00:00
dependabot[bot]
2acff17f28
build(deps): Bump black from 22.8.0 to 22.10.0 in /requirements (#28)
Bumps [black](https://github.com/psf/black) from 22.8.0 to 22.10.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/22.8.0...22.10.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-10 17:20:56 +00:00
dependabot[bot]
9f3fcea282
build(deps): Bump typing-extensions from 4.3.0 to 4.4.0 in /requirements (#27)
Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.3.0 to 4.4.0.
- [Release notes](https://github.com/python/typing_extensions/releases)
- [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python/typing_extensions/compare/4.3.0...4.4.0)

---
updated-dependencies:
- dependency-name: typing-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-10 13:15:07 -04:00
Matt Robinson
836f156582
docs: Add example LabelStudio sentiment analysis example (#24)
* added documentation on how to use unstructured with labelstudio

* hard code risk narrative for docs

* link to create project call
2022-10-10 08:27:01 -04:00
asymness
baba641d03
feat: Allow option to specify predictions in LabelStudio staging brick (#23)
* Allow stage_for_label_studio to take a predictions input and implement prediction class

* Update unit tests for LabelStudioPrediction and stage_for_label_studio function

* Update stage_for_label_studio docs with example of loading predictions

* Bump version and update changelog

Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-06 13:35:55 +00:00
Yuming Long
779e48bafe
chore: Integration test to show LabelStudio brick working with SDK (#21) 2022-10-05 14:38:44 -04:00
asymness
28a4ae985d
feat: Implement utility functions for reading and writing .jsonl files (#22)
* Implement save_as_jsonl and read_from_jsonl utility functions

* Add unit tests for save_as_jsonl and read_from_jsonl utility functions

* Add example of using save_as_jsonl with prodigy staging brick

* Bump version and update changelog

* remove accidentally added prodigy json file

* added "the" in jsonl description

Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
2022-10-04 09:51:11 -04:00
Matt Robinson
a950559b94
feat: Optionally include LabelStudio annotations in staging brick (#19)
* added types for label studio annotations

* added method to cast as dicts

* added length check for annotations

* tweaks to get upload to work

* added validation for label types

* annotations is a list for each example

* little bit of refactoring

* test for staging with label studio

* tests for error conditions and reviewers

* added test for NER annotations

* updated changelog and bumped version

* added docs with annotation examples

* fix label studio link

* bump version in sphinx docs

* fulle -> full (typo fix)
2022-10-04 13:25:05 +00:00
dependabot[bot]
29607c32ba
build(deps): Bump sphinx from 5.2.2 to 5.2.3 in /requirements (#20)
Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 5.2.2 to 5.2.3.
- [Release notes](https://github.com/sphinx-doc/sphinx/releases)
- [Changelog](https://github.com/sphinx-doc/sphinx/blob/5.x/CHANGES)
- [Commits](https://github.com/sphinx-doc/sphinx/compare/v5.2.2...v5.2.3)

---
updated-dependencies:
- dependency-name: sphinx
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
2022-10-03 16:05:44 -07:00
dependabot[bot]
8f822f187c
build(deps): Bump pdfplumber from 0.7.4 to 0.7.5 in /requirements (#14)
Bumps [pdfplumber](https://github.com/jsvine/pdfplumber) from 0.7.4 to 0.7.5.
- [Release notes](https://github.com/jsvine/pdfplumber/releases)
- [Changelog](https://github.com/jsvine/pdfplumber/blob/stable/CHANGELOG.md)
- [Commits](https://github.com/jsvine/pdfplumber/compare/v0.7.4...v0.7.5)

---
updated-dependencies:
- dependency-name: pdfplumber
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-03 18:35:26 +00:00
dependabot[bot]
14de7ff763
build(deps): Bump fonttools from 4.37.3 to 4.37.4 in /requirements (#15)
Bumps [fonttools](https://github.com/fonttools/fonttools) from 4.37.3 to 4.37.4.
- [Release notes](https://github.com/fonttools/fonttools/releases)
- [Changelog](https://github.com/fonttools/fonttools/blob/main/NEWS.rst)
- [Commits](https://github.com/fonttools/fonttools/compare/4.37.3...4.37.4)

---
updated-dependencies:
- dependency-name: fonttools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-03 18:26:08 +00:00
dependabot[bot]
3b556faffc
build(deps): Bump pytz from 2022.2.1 to 2022.4 in /requirements (#16)
Bumps [pytz](https://github.com/stub42/pytz) from 2022.2.1 to 2022.4.
- [Release notes](https://github.com/stub42/pytz/releases)
- [Commits](https://github.com/stub42/pytz/commits)

---
updated-dependencies:
- dependency-name: pytz
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-03 18:20:44 +00:00
dependabot[bot]
737215f58c
build(deps): Bump pytest-cov from 3.0.0 to 4.0.0 in /requirements (#17)
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 3.0.0 to 4.0.0.
- [Release notes](https://github.com/pytest-dev/pytest-cov/releases)
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v3.0.0...v4.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
2022-10-03 18:15:19 +00:00
dependabot[bot]
192731253e
build(deps): Bump mypy from 0.981 to 0.982 in /requirements (#18)
Bumps [mypy](https://github.com/python/mypy) from 0.981 to 0.982.
- [Release notes](https://github.com/python/mypy/releases)
- [Commits](https://github.com/python/mypy/compare/v0.981...v0.982)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-03 14:10:09 -04:00
asymness
d429e9b305
feat: Implement stage_csv_for_prodigy brick (#13)
* Refactor metadata validation and implement stage_csv_for_prodigy brick

* Refactor unit tests for metadata validation and add tests for Prodigy CSV brick

* Add stage_csv_for_prodigy description and example in docs

* Bump version and update changelog

* added _csv_ to function name

* update changelog line to 0.2.1-dev2

Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
2022-10-03 09:30:30 -04:00
Matt Robinson
90d4f40da8
chore: Bump __version__.py version to match CHANGELOG (#12) 2022-09-30 12:56:40 -04:00
asymness
35d488a466
feat: Implement stage_for_prodigy brick (#11)
* Implement unit tests for stage_for_prodigy brick

* Implement brick for converting data to Prodigy format

* Add stage_for_prodigy description and example to docs

* updated changelog

Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
2022-09-30 12:41:37 -04:00
Yuming Long
8eba1b6006
feat: Add shellcheck to CI and Make target (#10) 2022-09-29 15:24:28 -04:00
qued
64e1c725eb
feat: Add text_field and id_field to stage_for_label_studio signature (#9)
Added text_field and id_field to stage_for_label_studio signature, to allow user to specify the keys in the resulting JSON. Includes tests and update to example in sphinx docs.
2022-09-28 09:30:17 -05:00
Matt Robinson
212a98003a
build(security): Configure CodeQL scans (#8) 2022-09-27 17:06:23 -04:00
dependabot[bot]
8c364ccb86
build(deps): Bump sphinx from 5.1.1 to 5.2.2 in /requirements (#5)
Bumps [sphinx](https://github.com/sphinx-doc/sphinx) from 5.1.1 to 5.2.2.
- [Release notes](https://github.com/sphinx-doc/sphinx/releases)
- [Changelog](https://github.com/sphinx-doc/sphinx/blob/5.x/CHANGES)
- [Commits](https://github.com/sphinx-doc/sphinx/compare/v5.1.1...v5.2.2)

---
updated-dependencies:
- dependency-name: sphinx
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-27 13:42:10 -04:00
dependabot[bot]
7f16e33d0d
build(deps): Bump mypy from 0.971 to 0.981 in /requirements (#6)
Bumps [mypy](https://github.com/python/mypy) from 0.971 to 0.981.
- [Release notes](https://github.com/python/mypy/releases)
- [Commits](https://github.com/python/mypy/compare/v0.971...v0.981)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-27 13:38:53 -04:00
dependabot[bot]
ae87cb92cd
build(deps): Bump certifi from 2022.9.14 to 2022.9.24 in /requirements (#4)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.9.14 to 2022.9.24.
- [Release notes](https://github.com/certifi/python-certifi/releases)
- [Commits](https://github.com/certifi/python-certifi/compare/2022.09.14...2022.09.24)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-27 13:37:14 -04:00
Matt Robinson
ca0140df59
fix: Change the Dependabot ecosystem from pip-compile to pip 2022-09-27 13:35:01 -04:00
Matt Robinson
7c82adf775
build: Configure dependabot scan 2022-09-27 12:44:33 -04:00
Matt Robinson
e290f085af
docs: Link to security policy in the README 2022-09-27 10:32:55 -04:00
Matt Robinson
5f40c78f25 Initial Release 0.2.0 2022-09-26 14:55:20 -07:00