mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-31 09:10:15 +00:00
* Experimental Ci workflow for running tutorials * Run on every push for now * Not starting? * Disabling paths temporarily * Sort tutorials in natural order * Install ipython * remove ipython install * Try running ipython with sudo * env.pythonLocation * Skipping tutorial2 and 9 for speed * typo * Use one runner per tutorial, for now * Typo in dependend job * Missing quotes broke scripts matrix * Simplify setup for the tutorials, try to prevent containers conflict * Remove needless job dependencies * Try prevent cache issues, fix small Tut10 bug * Missing deps for running notebook tutorials * Create three groups of tutorials excluding the longest among them * remove deps * use proper bash loop * Try with a single string * Fix typo in echo * Forgot do * Typo * Try to make the GraphDB tutorial without launching its own container * Run notebook and script together * Whitespace * separate scrpits and notebooks execution * Run notebooks first * Try caching the GoT data before running the scripts * add note * fix mkdir * Fix path * Update Documentation & Code Style * missing -r * Fix folder numbering * Run notebooks as well * Typo in notebook command * complete path in notebook command * Try with TIKA_LOG_PATH * Fix folder naming * Do not use cached data in Tut9 * extracting the number better * Small tweaks * Same fix on Tut10 on the notebook * Exclude GoT cache for tut5 too * Remove faiss files after tutorial run * Layout * fix remove command * Fix path in tut10 notebook * Fix typo in node name in tut14 * Third block was too long, rebancing * Reduce GoT dataset even more, why wasting time after all... * Fix paths in tut10 again * do git clean to make sure to cleanup everything (breaks post Python) * Remove ES file with bad permission at the end of the run * Split first block, takes >30mins * take out tut15 for a moment, has an actual bug * typo * Forgot rm option * Simply remove all ES files * Improve logs of GoT reduction * Exclude also tut16 from cache to try fix bug * Replace ll with ls * Reintroduce 15_TableQA * Small regrouping * regrouping to make the min num of runners go for about 30mins * Add cron schedule and PR paths conditions * Add some timing information * Separate tutorials by diff and tutorials by cron * temp add pull_request to tutorials nightly * Add badge in README to keep track of the nightly tutorials run * Remove prefixes from data folder names * Add fetch depth to get diff with master * Fix paths again * typo * Exclude long-running ones * Typo * Fix tutorials.yml as well * Use head_ref * Using an action for now * exclude other files * Use only the correct command to run the tutorial * Add long running tutorials in separate runners, just for experiment * Factor out the complex bash script * Pass the python path to the bash script * Fix paths * adding log statement * Missing dollarsign * Resetting variable in loop * using mini GoT dataset and improving bash script * change dataset name Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
75 lines
2.4 KiB
Bash
Executable File
75 lines
2.4 KiB
Bash
Executable File
#!/bin/bash
|
|
|
|
export LAUNCH_GRAPHDB=0 # See tut 10 - GraphDB is already running in CI
|
|
export TIKA_LOG_PATH=$PWD # Avoid permission denied errors while importing tika
|
|
set -e # Fails on any error in the following loop
|
|
|
|
python_path=$1
|
|
files_changed=$2
|
|
exclusion_list=$3
|
|
no_got_tutorials='4_FAQ_style_QA 5_Evaluation 7_RAG_Generator 8_Preprocessing 10_Knowledge_Graph 15_TableQA 16_Document_Classifier_at_Index_Time'
|
|
|
|
echo "Files changed in this PR: $files_changed"
|
|
echo "Excluding: $exclusion_list"
|
|
|
|
# Collect the tutorials to run
|
|
scripts_to_run=""
|
|
for script in $files_changed; do
|
|
|
|
if [[ "$script" != *"tutorials/Tutorial"* ]] || ([[ "$script" != *".py"* ]] && [[ "$script" != *".ipynb"* ]]); then
|
|
echo "- not a tutorial: $script"
|
|
continue
|
|
fi
|
|
|
|
skip_to_next=0
|
|
for excluded in $exclusion_list; do
|
|
if [[ "$script" == *"$excluded"* ]]; then skip_to_next=1; fi
|
|
done
|
|
if [[ $skip_to_next == 1 ]]; then
|
|
echo "- excluded: $script"
|
|
continue
|
|
fi
|
|
|
|
scripts_to_run="$scripts_to_run $script"
|
|
done
|
|
|
|
for script in $scripts_to_run; do
|
|
|
|
echo ""
|
|
echo "##################################################################################"
|
|
echo "##################################################################################"
|
|
echo "## Running $script ..."
|
|
echo "##################################################################################"
|
|
echo "##################################################################################"
|
|
|
|
# Do not cache GoT data
|
|
reduce_dataset=1
|
|
for no_got_tut in $no_got_tutorials; do
|
|
if [[ "$script" == *"$no_got_tut"* ]]; then
|
|
reduce_dataset=0
|
|
fi
|
|
done
|
|
|
|
if [[ $reduce_dataset == 1 ]]; then
|
|
# Copy the reduced GoT data into a folder named after the tutorial
|
|
# to trigger the caching mechanism of `fetch_archive_from_http`
|
|
echo "Using reduced GoT dataset"
|
|
no_prefix=${script#"tutorials/Tutorial"}
|
|
split_on_underscore=(${no_prefix//_/ })
|
|
cp -r data/tutorials data/tutorial${split_on_underscore[0]}
|
|
else
|
|
echo "NOT using reduced GoT dataset!"
|
|
fi
|
|
|
|
if [[ "$script" == *".py" ]]; then
|
|
time python $script
|
|
else
|
|
sudo $python_path/bin/ipython -c "%run $script"
|
|
fi
|
|
git clean -f
|
|
|
|
done
|
|
|
|
# causes permission errors on Post Cache
|
|
sudo rm -rf data/
|
|
sudo rm -rf /home/runner/work/haystack/haystack/elasticsearch-7.9.2/ |