mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-12-29 08:05:08 +00:00
The current `test-ingest-src.sh` and `evaluation-metrics` do not allow passing the `EXPORT_DIR` (`OUTPUT_ROOT` in `evaluation-metrics`). It is currently saving at the current working directory (`unstructured/test_unstructured_ingest`). When running the eval from `core-product`, all outputs is now saved at `core-product/upstream-unstructured/test_unstructured_ingest` which is undesirable. This PR modifies two scripts to accommodate such behavior: 1. `test-ingest-src.sh` - assign `EVAL_OUTPUT_ROOT` to the value set within the environment if exist, or the current working directory if not. Then calls to run `evaluation-metrics.sh`. 2. `evaluation-metrics.sh` - accepting param from `test-ingest-src.sh` if exist, or to the value set within the environment if exist, or the current directory if not. (Note: I also add param to `evaluation-metrics.sh` because it makes sense to allow a separate run to be able to specify an export directory) This PR should work in sync with another PR under `core-product`, which I will add the link here later. **To test:** Run the script below, change `$SCRIPT_DIR` as needed to see the result. ``` export OVERWRITE_FIXTURES=true ./upstream-unstructured/test_unstructured_ingest/src/s3.sh SCRIPT_DIR=$(dirname "$(realpath "$0")") bash -x ./upstream-unstructured/test_unstructured_ingest/evaluation-metrics.sh text-extraction "$SCRIPT_DIR" ``` ---- This PR also updates the requirements by `make pip-compile` since the `click` module was not found.
25 lines
447 B
Plaintext
25 lines
447 B
Plaintext
#
|
|
# This file is autogenerated by pip-compile with Python 3.9
|
|
# by the following command:
|
|
#
|
|
# pip-compile --output-file=extra-csv.txt extra-csv.in
|
|
#
|
|
numpy==1.26.4
|
|
# via
|
|
# -c base.txt
|
|
# pandas
|
|
pandas==2.2.0
|
|
# via -r extra-csv.in
|
|
python-dateutil==2.8.2
|
|
# via
|
|
# -c base.txt
|
|
# pandas
|
|
pytz==2024.1
|
|
# via pandas
|
|
six==1.16.0
|
|
# via
|
|
# -c base.txt
|
|
# python-dateutil
|
|
tzdata==2024.1
|
|
# via pandas
|