mirror of
https://github.com/deepset-ai/haystack.git
synced 2026-01-01 01:27:28 +00:00
* Experimental Ci workflow for running tutorials * Run on every push for now * Not starting? * Disabling paths temporarily * Sort tutorials in natural order * Install ipython * remove ipython install * Try running ipython with sudo * env.pythonLocation * Skipping tutorial2 and 9 for speed * typo * Use one runner per tutorial, for now * Typo in dependend job * Missing quotes broke scripts matrix * Simplify setup for the tutorials, try to prevent containers conflict * Remove needless job dependencies * Try prevent cache issues, fix small Tut10 bug * Missing deps for running notebook tutorials * Create three groups of tutorials excluding the longest among them * remove deps * use proper bash loop * Try with a single string * Fix typo in echo * Forgot do * Typo * Try to make the GraphDB tutorial without launching its own container * Run notebook and script together * Whitespace * separate scrpits and notebooks execution * Run notebooks first * Try caching the GoT data before running the scripts * add note * fix mkdir * Fix path * Update Documentation & Code Style * missing -r * Fix folder numbering * Run notebooks as well * Typo in notebook command * complete path in notebook command * Try with TIKA_LOG_PATH * Fix folder naming * Do not use cached data in Tut9 * extracting the number better * Small tweaks * Same fix on Tut10 on the notebook * Exclude GoT cache for tut5 too * Remove faiss files after tutorial run * Layout * fix remove command * Fix path in tut10 notebook * Fix typo in node name in tut14 * Third block was too long, rebancing * Reduce GoT dataset even more, why wasting time after all... * Fix paths in tut10 again * do git clean to make sure to cleanup everything (breaks post Python) * Remove ES file with bad permission at the end of the run * Split first block, takes >30mins * take out tut15 for a moment, has an actual bug * typo * Forgot rm option * Simply remove all ES files * Improve logs of GoT reduction * Exclude also tut16 from cache to try fix bug * Replace ll with ls * Reintroduce 15_TableQA * Small regrouping * regrouping to make the min num of runners go for about 30mins * Add cron schedule and PR paths conditions * Add some timing information * Separate tutorials by diff and tutorials by cron * temp add pull_request to tutorials nightly * Add badge in README to keep track of the nightly tutorials run * Remove prefixes from data folder names * Add fetch depth to get diff with master * Fix paths again * typo * Exclude long-running ones * Typo * Fix tutorials.yml as well * Use head_ref * Using an action for now * exclude other files * Use only the correct command to run the tutorial * Add long running tutorials in separate runners, just for experiment * Factor out the complex bash script * Pass the python path to the bash script * Fix paths * adding log statement * Missing dollarsign * Resetting variable in loop * using mini GoT dataset and improving bash script * change dataset name Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
74 lines
2.5 KiB
YAML
74 lines
2.5 KiB
YAML
name: Tutorials
|
|
|
|
on:
|
|
workflow_dispatch: # Activate this workflow manually
|
|
pull_request:
|
|
paths:
|
|
- 'tutorials/*.*'
|
|
|
|
|
|
jobs:
|
|
|
|
run:
|
|
runs-on: ubuntu-latest
|
|
|
|
steps:
|
|
- uses: actions/checkout@v2
|
|
with:
|
|
fetch-depth: 0
|
|
|
|
- run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
|
|
|
|
- name: Set up Python 3.7
|
|
uses: actions/setup-python@v2
|
|
with:
|
|
python-version: 3.7
|
|
|
|
- name: Cache Python
|
|
uses: actions/cache@v2
|
|
with:
|
|
path: ${{ env.pythonLocation }}
|
|
key: linux-${{ env.date }}-${{ hashFiles('**/setup.py') }}-${{ hashFiles('**/setup.cfg') }}-${{ hashFiles('**/pyproject.toml') }}
|
|
|
|
- name: Run Elasticsearch
|
|
run: docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx128m" elasticsearch:7.9.2
|
|
|
|
- name: Run Apache Tika
|
|
run: docker run -d -p 9998:9998 -e "TIKA_CHILD_JAVA_OPTS=-JXms128m" -e "TIKA_CHILD_JAVA_OPTS=-JXmx128m" apache/tika:1.24.1
|
|
|
|
- name: Run GraphDB
|
|
run: docker run -d -p 7200:7200 --name graphdb-instance-tutorial docker-registry.ontotext.com/graphdb-free:9.4.1-adoptopenjdk11
|
|
|
|
- name: Install pdftotext
|
|
run: wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin
|
|
|
|
- name: Install graphviz
|
|
run: sudo apt install libgraphviz-dev graphviz
|
|
|
|
# Haystack needs to be reinstalled at this stage to make sure the current commit's version is the one getting tested.
|
|
# The cache can last way longer than a specific action's run, so older Haystack version could be carried over.
|
|
- name: Reinstall Haystack
|
|
run: |
|
|
pip install --upgrade pip
|
|
pip install .[all]
|
|
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cpu.html
|
|
pip install pygraphviz
|
|
pip install ipython nbformat
|
|
|
|
- name: Cache mini GoT dataset
|
|
run: |
|
|
mkdir -p data/tutorials
|
|
cd data/tutorials
|
|
wget https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1_mini.zip -q &> /dev/null
|
|
unzip wiki_gameofthrones_txt1_mini.zip
|
|
rm wiki_gameofthrones_txt1_mini.zip
|
|
|
|
- uses: jitterbit/get-changed-files@v1
|
|
id: diff
|
|
with:
|
|
format: space-delimited
|
|
token: ${{ secrets.GITHUB_TOKEN }}
|
|
|
|
- name: Run tutorials
|
|
run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_"
|