unstructured

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-06-27 02:30:08 +00:00

Author	SHA1	Message	Date
cragwolfe	bd8a74d686	chore: shell scripts default indent of 2 instead of 4 (#2287 ) Given the tendency for shell scripts to easily enter into a few levels of indentation and long line lengths, update the default to 2 spaces.	2023-12-19 07:48:21 +00:00
Roman Isecke	76efcf4dd7	chore: add shfmt (#2246 ) ### Description Given all the shell files that now exist in the repo, would be nice to have linting/formatting around them (in addition to the existing shellcheck which doesn't do anything to format the shell code). This PR introduces `shfmt` to both check for changes and apply formatting when the associated make targets are called.	2023-12-12 01:04:15 +00:00
Klaijan	877a30aed3	fix: fix eval ci to skip the overwrite if none exists (#2159 ) Currently the `check-diff-evaluation-metrics` only runs when there is file to perform evaluation on. Add the checking condition to skip the action when there is none. Additionally, more refactoring and `visualize` option for both evaluation calculation functions is also added.	2023-11-25 15:46:05 +00:00
Klaijan	2c2d5b65ca	refactor: measure_text_edit_distance function for aggregation (#2108 ) - Refactor `metrics/evaluation.py` to accepts `grouping` as parameter. - Switch to `DataFrame` for easier analysis and aggregation.	2023-11-22 13:30:16 -08:00
Klaijan	366c8af2ae	ci: make eval fail on diff (#2138 ) Add conditions on `check-diff-evaluation-metrics.sh` that exits when there's diff between new evaluation metric outputs and the old one.	2023-11-21 20:55:03 -08:00
Klaijan	433c3889dc	ci: reorganize eval output folders and add azure to matrix test (#2093 ) Summary The CI workflow for evaluation previously saved the metric outputs to the `metrics/` folder. Currently structured in subfolders e.g. `metrics/text-extraction` `metrics/element-type` for the folder clean up purpose. Additionally, Azure connector is also added to `full_python_matrix_tests` in this PR. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: Klaijan <Klaijan@users.noreply.github.com>	2023-11-21 20:04:30 +00:00
Klaijan	5ba3b9c2c6	chore: get eval metrics from ingest in (#2097 ) Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: Klaijan <Klaijan@users.noreply.github.com>	2023-11-17 18:22:36 +00:00
Klaijan	777a428071	chore: for ingest-test metrics, also check subdirs (#2079 ) - Copy script only went through one layer of subdirectory so it did not found the match between manifest file and structured output. Now edited to search all subdirectories. - `set -e` causes the script to exit at any exit rather than `exit 0`, fix all scripts that needs to run the copy script to be `set +e` right before the check diff, then back to `set -e` after - Edit the default evaluation metrics output from `metrics` to `metrics-tmp` to account for diff check - Add a script that checks the differences between old eval metric output (metrics) and new eval metrics output (metrics-tmp)	2023-11-15 21:02:43 -08:00

8 Commits