From d79f633adafe276bcd3ac721fd33c0b0ecc2c914 Mon Sep 17 00:00:00 2001 From: qued <64741807+qued@users.noreply.github.com> Date: Tue, 24 Oct 2023 14:19:09 -0500 Subject: [PATCH] build(deps): add typing extensions dep (#1835) Closes #1330. Added `typing-extensions` as an explicit dependency (it was previously an implicit dependency via `dataclasses-json`). This dependency should be explicit, since we import from it directly in `unstructured.documents.elements`. This has the added benefit that `TypedDict` will be available for Python 3.7 users. Other changes: * Ran `pip-compile` * Fixed a bug in `version-sync.sh` that caused an error when using the sync functionality when syncing to a dev version from a release version. #### Testing: To test the Python 3.7 functionality, in a Python 3.7 environment install the base requirements and run ```python from unstructured.documents.elements import Element ``` This also works on `main` as `typing_extensions` is a requirement. However if you `pip uninstall typing-extensions`, and run the above code, it should fail. So this update makes sure `typing-extensions` doesn't get lost if the other dependencies move around. To reproduce the `version-sync.sh` bug that was fixed, in `main`, increment the most recent version in `CHANGELOG.md` while leaving the version in `__version__.py`. Then add the following lines to `version-sync.sh` to simulate a particular set of circumstances, starting on line 114: ``` MAIN_IS_RELEASE=true CURRENT_BRANCH="something-not-main" ``` Then run `make version-sync`. The expected behavior is that the version in `__version__.py` is changed to the new version to match `CHANGELOG.md`, but instead it exits with an error. The fix was to only do the version incrementation check when the script is running in `-c` or "check" mode. --- CHANGELOG.md | 3 ++- requirements/base.in | 1 + requirements/base.txt | 4 +++- scripts/version-sync.sh | 20 ++++++++++++++------ unstructured/__version__.py | 2 +- 5 files changed, 21 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 59c88b4cf..1512e3555 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,4 @@ -## 0.10.26-dev4 +## 0.10.26-dev5 ### Enhancements @@ -10,6 +10,7 @@ ### Fixes +* **Adds `typing-extensions` as an explicit dependency** This package is an implicit dependency, but the module is being imported directly in `unstructured.documents.elements` so the dependency should be explicit in case changes in other dependencies lead to `typing-extensions` being dropped as a dependency. * ** Stop passing `extract_tables` to unstructured-inference ** since it is now supported in unstructured instead. Also noted the table output regressioin for PDF files. * **Fix a bug on Table partitioning** Previously the `skip_infer_table_types` variable used in partition was not being passed down to specific file partitioners. Now you can utilize the `skip_infer_table_types` list variable in partition to pass the filetype you want to exclude `text_as_html` metadata field for, or the `infer_table_structure` boolean variable on the file specific partitioning function. diff --git a/requirements/base.in b/requirements/base.in index a52b9d93d..bbbdddf20 100644 --- a/requirements/base.in +++ b/requirements/base.in @@ -14,3 +14,4 @@ langdetect numpy rapidfuzz backoff +typing-extensions diff --git a/requirements/base.txt b/requirements/base.txt index 4be522c96..5644ad293 100644 --- a/requirements/base.txt +++ b/requirements/base.txt @@ -63,7 +63,9 @@ tabulate==0.9.0 tqdm==4.66.1 # via nltk typing-extensions==4.8.0 - # via typing-inspect + # via + # -r requirements/base.in + # typing-inspect typing-inspect==0.9.0 # via dataclasses-json urllib3==1.26.18 diff --git a/scripts/version-sync.sh b/scripts/version-sync.sh index bd265cb88..a99f4da3d 100755 --- a/scripts/version-sync.sh +++ b/scripts/version-sync.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash -set -eu +set -u function usage { echo "Usage: $(basename "$0") [-c] -f FILE_TO_CHANGE REPLACEMENT_FORMAT [-f FILE_TO_CHANGE REPLACEMENT_FORMAT ...]" 2>&1 @@ -123,12 +123,19 @@ for i in "${!FILES_TO_CHECK[@]}"; do # No match to semver regex in VERSIONFILE, so nothing to replace printf "Error: No semver version found in file %s.\n" "$FILE_TO_CHANGE" exit 1 - elif [[ "$MAIN_IS_RELEASE" == true && "$FILE_VERSION" == "$MAIN_VERSION" && "$CURRENT_BRANCH" != "main" ]]; - then - # Only one commit should be associated with a particular non-dev version - printf "Error: there is already a commit associated with version %s.\n" "$MAIN_VERSION" - exit 1 else + if [[ "$MAIN_IS_RELEASE" == true && "$UPDATED_VERSION" == "$MAIN_VERSION" && "$CURRENT_BRANCH" != "main" ]]; + then + # Only one commit should be associated with a particular non-dev version + if [[ "$CHECK" == 1 ]]; + then + printf "Error: there is already a commit associated with version %s.\n" "$MAIN_VERSION" + exit 1 + else + printf "Warning: there is already a commit associated with version %s.\n" "$MAIN_VERSION" + fi + fi + # Replace semver in VERSIONFILE with semver obtained from SOURCE_FILE TMPFILE=$(mktemp /tmp/new_version.XXXXXX) # Check sed version, exit if version < 4.3 @@ -163,6 +170,7 @@ done # Exit with code determined by whether changes were needed in a check. if [ ${FAILED_CHECK} -ne 0 ]; then + printf "\nVersions are out of sync! See above for diffs.\n" exit 1 else exit 0 diff --git a/unstructured/__version__.py b/unstructured/__version__.py index 024a76a65..6ade83efc 100644 --- a/unstructured/__version__.py +++ b/unstructured/__version__.py @@ -1 +1 @@ -__version__ = "0.10.26-dev4" # pragma: no cover +__version__ = "0.10.26-dev5" # pragma: no cover