unstructured/test_unstructured_ingest/test-ingest-confluence-diff.sh

31 lines
1.1 KiB
Bash
Raw Normal View History

feat: confluence connector (cloud) (#906) * Add confluence connector and an example script * add test script, add dependency installations * add authentication secret variables for ci tests and actions * add dependency installation commands for workflows * add dependency installation commands for workflows * Update ingest test fixtures (#907) Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com> * add add ingest test fixtures update workflow for python 3.10, update example script with dummy values * change workflow name to avoid confusion * change workflow name to avoid confusion * only leave 3.8 in ingest test matrix to test consistent partitioning among python versions, remove 3.10 workflow for the test fixtures update * only leave 3.8 in ingest test matrix to test consistent partitioning among python versions * Update ingest test fixtures (#911) Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com> * revert back the test python version matrix * recompile dependencies * modifications for shellcheck * update changelog and version * changelog and version * remove comments * Update ingest test fixtures (#915) Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com> * add the option to state the number of spaces to be fetched * add scroll functionality, expose --confluence-num-of-spaces, --confluence-list-of-spaces and --confluence-num-of-docs-from-each-space to users * add help message * add docstrings for two tests, validate grabbing every doc in the fetched spaces, count number of files instead of diffing for confluence2 test * change test names * rename connector arg Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> * change arg name for connector Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> * add comment to example * change arg names * add new tests to ingest test * shellcheck remove redundant statement * Update ingest test fixtures (#932) Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com> * Update ingest test fixtures (#936) Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com> * linting * change file extensions to parse as html * Update ingest test fixtures (#943) Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com> * remove old fixtures * update version to 0.8.2-dev3 * change file to trigger CI * change file to trigger CI * change file to trigger CI * change file to trigger CI --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>
2023-07-18 19:29:41 +01:00
#!/usr/bin/env bash
set -e
# Description: This test checks if all the processed content is the same as the expected outputs.
SCRIPT_DIR=$(dirname "$(realpath "$0")")
cd "$SCRIPT_DIR"/.. || exit 1
OUTPUT_FOLDER_NAME=confluence-diff
OUTPUT_DIR=$SCRIPT_DIR/structured-output/$OUTPUT_FOLDER_NAME
DOWNLOAD_DIR=$SCRIPT_DIR/download/$OUTPUT_FOLDER_NAME
if [ -z "$CONFLUENCE_USER_EMAIL" ] || [ -z "$CONFLUENCE_API_TOKEN" ]; then
echo "Skipping Confluence ingest test because the CONFLUENCE_USER_EMAIL or CONFLUENCE_API_TOKEN env var is not set."
exit 0
fi
PYTHONPATH=. ./unstructured/ingest/main.py \
--download-dir "$DOWNLOAD_DIR" \
--confluence-url https://unstructured-ingest-test.atlassian.net \
--confluence-user-email "$CONFLUENCE_USER_EMAIL" \
--confluence-api-token "$CONFLUENCE_API_TOKEN" \
--confluence-max-num-of-spaces 4 \
--metadata-exclude filename,file_directory,metadata.data_source.date_processed \
--num-processes 2 \
--preserve-downloads \
--reprocess \
--structured-output-dir "$OUTPUT_DIR"
sh "$SCRIPT_DIR"/check-diff-expected-output.sh $OUTPUT_FOLDER_NAME