153 Commits

Author SHA1 Message Date
Julian Risch
b685409c78
chore: add topic tags to auto generation of release notes (#3008) 2022-08-09 17:12:42 +02:00
Vladimir Blagojevic
50f7d660e2
Add slack hook for test failures (#2996) 2022-08-09 08:27:52 -04:00
Massimiliano Pippi
0e8efdafa9
Add enhanced pydoc-markdown pre-hook (#2979)
* add pydoc-markdown pre-hook

* add more comments, remove debug prints
2022-08-08 12:41:21 +02:00
Tobias Wochinger
065173fe5e
chore: add PR template (#2883)
* chore: add PR template

* ci: update PR template after latest discussions in Notion

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update .github/pull_request_template.md

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* docs: re-order and add link

* docs: add new conventions to contributor guidelines

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-05 18:14:18 +02:00
Massimiliano Pippi
40d07c2038
Enable Opensearch unit tests in Windows CI (#2936)
* enable Opensearch unit tests under Win

* move unit tests into a dedicated job

* skip audio tests on missing dependencies

* avoid failing test collection when soundfile is not available

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-03 19:19:07 +02:00
Sara Zan
669f6f0128
Add git diff to schema checks (#2959) 2022-08-03 09:46:38 -04:00
Massimiliano Pippi
e766bb8684
add code owners (#2950)
* add code owners

* add tutorials folder
2022-08-03 10:48:30 +02:00
Vladimir Blagojevic
86d56b4dfe
Add HF model caching for integration tests (#2909)
* Add HF model caching for integration tests

* Remove windows mode caching - not worth it
2022-07-29 18:17:05 +02:00
Sara Zan
434b1c3682
Disable a few checks in the pre-commit hook (#2929)
* Disable small checks giving trouble to pydoc-markdown and JSON Schema

* Add instructions for JSON schema generator in the workflow logs
2022-07-29 17:02:56 +02:00
Massimiliano Pippi
e7627c3f8b
Use opensearch-py in OpenSearchDocumentStore (#2691)
* add Opensearch extras

* let OpenSearchDocumentStore use opensearch-py

* Update Documentation & Code Style

* fix a bug found after adding tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-28 10:04:49 +02:00
Zoltan Fedor
adb2b2c312
Add support for BM25 with the Weaviate document store (#2860)
* Upgrading Weaviate used for testing to 1.14.1 from 1.11.0

This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue.

* Weaviate client upgrade

From v3.3.3 to v3.6.0

* Adding BM25 Retrieval to Weaviate

Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters).
This commit adds support for inverted index (BM25) querying against Weaviate.

* Running Black on the recent code changes

* Update Documentation & Code Style

* Fixing linting issues after code changes by black

* The BM25 query needs to be in all lowercase for now

The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate.
See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119

* Fixing method parameter docstring to highlight that they are not supported in Weaviate

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-27 10:07:13 +02:00
Sara Zan
2d65c380f1
pre-commit hooks (#2819)
* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
2022-07-26 15:02:15 +02:00
Sara Zan
5d8476eb58
Restart containers in tutorials.sh (#2858)
* restart tutorials in the loop

* remove container steps in tutorials.yml

* forgotten quotes

* unmatched bracket

* give names to containers

* try to limit the log size

* make the containers restart on the scripts as well

* feedback

* Raise integration tests timeout

* raising limit again
2022-07-25 17:35:36 +02:00
Sara Zan
5119acb260
Raise timeout on integration tests (#2880) 2022-07-25 06:43:20 -04:00
Massimiliano Pippi
8ee2b6b403
Add a custom pydoc renderer for Readme.io (#2825)
* add custom pydoc renderer

* create an example

* revert example code
2022-07-22 10:43:51 +02:00
Sara Zan
48644b23fb
Enable CI on tutorials (#2801)
* enable ci on tutorials

* Disable all path restrictions for safety

* actually comment out the paths block

* remove comment
2022-07-18 17:59:55 +02:00
Sara Zan
6b39fbd39c
Mocking Pinecone tests (#2778)
* Integrating the mock into conftest.py

* re-enable workflow

* delete_all

* Update Documentation & Code Style

* remove ValueError

* Add empty response

* wrong condition

* return response

* revert removal of delete_all

* change mock

* Update Documentation & Code Style

* test for rest api, to revert

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-14 20:03:33 +02:00
Massimiliano Pippi
82df677ebf
API tests (#2738)
* clean up tests and run earlier

* use change detection

* better naming, skip ES

* more cleanup

* fix job name

* dummy commit to trigger the CI

* mock away the PDF converter

* make the test compatible with 3.7

* removed leftover

* always run the api tests, use a matrix for the OS

* refactor all the tests

* remove outdated dependency

* pylint

* new abstract method

* adjust for older python versions

* rename pipeline file

* address PR comments
2022-07-14 15:36:28 +02:00
Sara Zan
091711b8c4
Fix Tutorials and Tutorials (nightly) (#2737)
* Remove caching and install audio deps

* Fix `Tutorials` as well

* Run all tutorials even though some fail

* Forgot fi

* fix failure condition

* proper bash string equality

* Enable debug logs

* remove audio files

* Update Documentation & Code Style

* Use the setup action in the Tutorial CI as well

* Try with a file that exists

* Update Documentation & Code Style

* Fix the comments in the tutorials

* Update Documentation & Code Style

* Fix tutorials.sh

* Remove debug logging

* import pprint and try editable install

* Update Documentation & Code Style

* extract no run list

* Add tutorial18 to no run list nightly

* import pprint correctly

* Update Documentation & Code Style

* try making site-packages editable

* Make pythonpath editable every time Tut17 is run on CI

* typo

* fix imports in tut5

* add git clean

* Update Documentation & Code Style

* add comments and remove` -e`

* accidentally deleted a line

* Update .github/utils/tutorials.sh

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-07-12 11:22:17 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA (#2605)
* first draft of openai node for QA

* Update Documentation & Code Style

* fix mypy. add node to inits

* Update Documentation & Code Style

* fix linter

* Adapt OpenAIGenerator to completions endpoint

* Update Documentation & Code Style

* Fix pylint

* Fix doc strings

* Make use of temperature

* Make use of api key in tests

* Adapt doc strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
tstadel
2a7c0139f5
double max heap size for elasticsearch in CI (#2756) 2022-07-05 13:53:32 +02:00
Vladimir Blagojevic
ffb7e4e4bd
GPL tutorial - add GPU header and open in colab button (#2736)
* GPL tutorial - add GPU header and open in colab button

* Add GPL tutorial to run exclusion list
2022-07-04 05:23:39 -04:00
Julian Risch
1781e88802
Upgrade torch to 1.12 (#2741)
* Upgrade torch to 1.12

* upgrade torch-scatter

* add explicit torch-scatter installation

* set torch dependency to range >1.9,<1.13
2022-07-01 20:23:32 +02:00
Vladimir Blagojevic
b08c5f81d1
Add GPL adaptation tutorial (#2632)
* Add GPL adaptation tutorial

* Latest round of Aga's corrections

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-26 02:44:57 -04:00
Sara Zan
426f49979b
Change repo with repository in python_cache (#2731)
* Change repo with repository

* remove name

* using owner and name

* use owner name

* replace name with login

* Trying with the PR context instead
2022-06-24 18:36:19 +02:00
Sara Zan
6a7152044e
add repo name as well (#2729) 2022-06-24 17:08:28 +02:00
Sara Zan
13514f960d
Speficy ref in action (#2727) 2022-06-24 15:56:17 +02:00
Sara Zan
400d2cdf77
Fix audio tests on CI (#2718)
* Update Documentation & Code Style

* fix huggingface-hub version

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-24 11:36:31 +02:00
Rob Pasternak
b87c0c950b
Tutorial 14 edit (#2663)
* Rewrite Tutorial 14 for increased user-friendliness

* Update Tutorial14 .py file to match .ipynb file

* Update Documentation & Code Style

* unblock the ci

* ignore error in jitterbit/get-changed-files

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-06-22 13:03:07 +02:00
Sara Zan
505ababf43
Skip Pinecone tests (#2696)
* comment out Pinecone tests block

* Add comment
2022-06-21 14:49:36 +02:00
Massimiliano Pippi
5d255f0e4a
replace question issue with link to discussions (#2697) 2022-06-21 14:10:11 +02:00
Sara Zan
a6c06ee376
Update contributor's checklists in PR template (#2659)
* Split contributor's and reviewer's checklists

* contributor-centric checklist

* Move issues at the top and split entry

* phrasing
2022-06-21 10:11:18 +02:00
Sara Zan
a26c042994
Fix typo in code_and_docs.sh (#2662)
* Fix typo in code_and_docs.sh & install ffmpeg in autoformat.yml

* apt update to get ffmpeg

* Update Documentation & Code Style

* Add header and better error message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 13:50:55 +02:00
Sara Zan
776eba0cd1
Remove pull_request from triggers (#2661) 2022-06-15 10:14:22 +02:00
Sara Zan
584e046642
AnswerToSpeech (#2584)
* Add new audio answer primitives

* Add AnswerToSpeech

* Add dependency group

* Update Documentation & Code Style

* Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives

* Add tests

* Update Documentation & Code Style

* Add ability to compress audio and more tests

* Add audio group to test, all and all-gpu

* fix pylint

* Update Documentation & Code Style

* Accidental git tag

* Try pleasing mypy

* Update Documentation & Code Style

* fix pylint

* Add warning for missing OS library and support in CI

* Try fixing mypy

* Update Documentation & Code Style

* Add docs, simplify args for audio nodes and add tutorials

* Fix mypy

* Fix run_batch

* Feedback on tutorials

* fix mypy and pylint

* Fix mypy again

* Fix mypy yet again

* Fix the ci

* Fix dicts merge and install ffmpeg on CI

* Make the audio nodes import safe

* Trying to increase tolerance in audio test

* Fix import paths

* fix linter

* Update Documentation & Code Style

* Add audio libs in unit tests

* Update _text_to_speech.py

* Update answer_to_speech.py

* Use dedicated dataset & update telemetry

* Remove  and use distilled roberta

* Revert special primitives so that the nodes run in indexing

* Improve tutorials and fix smaller bugs

* Update Documentation & Code Style

* Fix serialization issue

* Update Documentation & Code Style

* Improve tutorial

* Update Documentation & Code Style

* Update _text_to_speech.py

* Minor lg updates

* Minor lg updates to tutorial

* Making indexing work in tutorials

* Update Documentation & Code Style

* Improve docstrings

* Try to use GPU when available

* Update Documentation & Code Style

* Fixi mypy and pylint

* Try to pass the device correctly

* Update Documentation & Code Style

* Use type of device

* use .cpu()

* Improve .ipynb

* update apt index to be able to download libsndfile1

* Fix SpeechDocument.from_dict()

* Change pip URL

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-06-15 10:13:18 +02:00
Sara Zan
735ffa635b
[CI refactoring] Tutorials on CI (#2547)
* Experimental Ci workflow for running tutorials

* Run on every push for now

* Not starting?

* Disabling paths temporarily

* Sort tutorials in natural order

* Install ipython

* remove ipython install

* Try running ipython with sudo

* env.pythonLocation

* Skipping tutorial2 and 9 for speed

* typo

* Use one runner per tutorial, for now

* Typo in dependend job

* Missing quotes broke scripts matrix

* Simplify setup for the tutorials, try to prevent containers conflict

* Remove needless job dependencies

* Try prevent cache issues, fix small Tut10 bug

* Missing deps for running notebook tutorials

* Create three groups of tutorials excluding the longest among them

* remove deps

* use proper bash loop

* Try with a single string

* Fix typo in echo

* Forgot do

* Typo

* Try to make the GraphDB tutorial without launching its own container

* Run notebook and script together

* Whitespace

* separate scrpits and notebooks execution

* Run notebooks first

* Try caching the GoT data before running the scripts

* add note

* fix mkdir

* Fix path

* Update Documentation & Code Style

* missing -r

* Fix folder numbering

* Run notebooks as well

* Typo in notebook command

* complete path in notebook command

* Try with TIKA_LOG_PATH

* Fix folder naming

* Do not use cached data in Tut9

* extracting the number better

* Small tweaks

* Same fix on Tut10 on the notebook

* Exclude GoT cache for tut5 too

* Remove faiss files after tutorial run

* Layout

* fix remove command

* Fix path in tut10 notebook

* Fix typo in node name in tut14

* Third block was too long, rebancing

* Reduce GoT dataset even more, why wasting time after all...

* Fix paths in tut10 again

* do git clean to make sure to cleanup everything (breaks post Python)

* Remove ES file with bad permission at the end of the run

* Split first block, takes >30mins

* take out tut15 for a moment, has an actual bug

* typo

* Forgot rm option

* Simply remove all ES files

* Improve logs of GoT reduction

* Exclude also tut16 from cache to try fix bug

* Replace ll with ls

* Reintroduce 15_TableQA

* Small regrouping

* regrouping to make the min num of runners go for about 30mins

* Add cron schedule and PR paths conditions

* Add some timing information

* Separate tutorials by diff and tutorials by cron

* temp add pull_request to tutorials nightly

* Add badge in README to keep track of the nightly tutorials run

* Remove prefixes from data folder names

* Add fetch depth to get diff with master

* Fix paths again

* typo

* Exclude long-running ones

* Typo

* Fix tutorials.yml as well

* Use head_ref

* Using an action for now

* exclude other files

* Use only the correct command to run the tutorial

* Add long running tutorials in separate runners, just for experiment

* Factor out the complex bash script

* Pass the python path to the bash script

* Fix paths

* adding log statement

* Missing dollarsign

* Resetting variable in loop

* using mini GoT dataset and improving bash script

* change dataset name

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 09:53:36 +02:00
Sara Zan
8d7439c623
Move autoformat-check.yml into tests.yml (#2635) 2022-06-10 18:22:16 +02:00
Sara Zan
9968c373d2
make 'ready for review' an event that triggers the tests (#2643) 2022-06-09 09:23:38 +02:00
Sara Zan
c2d2faf31e
Add directive in tests.yml (#2637) 2022-06-07 13:31:19 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring (#2576)
* Unify CI tests (from #2466)

* Update Documentation & Code Style

* Change folder names

* Fix markers list

* Remove marker 'slow', replaced with 'integration'

* Soften children check

* Start ES first so it has time to boot while Python is setup

* Run the full workflow

* Try to make pip upgrade on Windows

* Set KG tests as integration

* Update Documentation & Code Style

* typo

* faster pylint

* Make Pylint use the cache

* filter diff files for pylint

* debug pylint statement

* revert pylint changes

* Remove path from asserted log (fails on Windows)

* Skip preprocessor test on Windows

* Tackling Windows specific failures

* Fix pytest command for windows suites

* Remove \ from command

* Move poppler test into integration

* Skip opensearch test on windows

* Add tolerance in reader sas score for Windows

* Another pytorch approx

* Raise time limit for unit tests :(

* Skip poppler test on Windows CI

* Specify to pull with FF only in docs check

* temporarily run the docs check immediately

* Allow merge commit for now

* Try without fetch depth

* Accelerating test

* Accelerating test

* Add repository and ref alongside fetch-depth

* Separate out code&docs check from tests

* Use setup-python cache

* Delete custom action

* Remove the pull step in the docs check, will find a way to run on bot commits

* Add requirements.txt in .github for caching

* Actually install dependencies

* Change deps group for pylint

* Unclear why the requirements.txt is still required :/

* Fix the code check python setup

* Install all deps for pylint

* Make the autoformat check depend on tests and doc updates workflows

* Try installing dependencies in another order

* Try again to install the deps

* quoting the paths

* Ad back the requirements

* Try again to install rest_api and ui

* Change deps group

* Duplicate haystack install line

* See if the cache is the problem

* Disable also in mypy, who knows

* split the install step

* Split install step everywhere

* Revert "Separate out code&docs check from tests"

This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.

* Add back the action

* Proactive support for audio (see text2speech branch)

* Fix label generator tests

* Remove install of libsndfile1 on win temporarily

* exclude audio tests on win

* install ffmpeg for integration tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
Sara Zan
89bb1ca139
[CI refactoring] Improve autoformat.yml (#2556)
* Restructure autoformat to run a single script

* Reduce diff for autoforma.yml

* Reduce diff on linux_ci.yml
2022-05-18 20:02:43 +02:00
Julian Risch
70ca1e9fc6
Smaller demo instance type (#2564)
This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge.
g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory
p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory
which results in 75% lower costs with g4dn.2xlarge.

I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.
2022-05-17 12:47:15 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders (#2554)
* Categorize tests into folders

* Fix linux_ci.yml and an import

* Wrong path
2022-05-17 09:55:53 +01:00
Ivan Lopez
a2a99f79b1
Fix docker image tag with semantic version for releases (#2548)
* Fix docker tag with semantic version for releases

* Prepend latest docker tag with tagprefix in cache-from
2022-05-16 13:26:33 +02:00
bogdankostic
300ee1ac83
Upgrade torch version to 1.11 (#2538)
* Bump torch version

* Upgrade torch version in torch-scatter
2022-05-13 14:45:53 +02:00
Sara Zan
15a9ff6f67
PR template mention of enabling Actions (#2523)
* Update version to 1.4.1rc0

* Add hint of enabling action on the fork in the PR template

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 09:46:09 +02:00
Sara Zan
f3e0ba4be9
Fix OpenSearchDocumentStore's __init__ (#2498)
* Move super in OpenSearchDocumentStore and add small test

* Update Documentation & Code Style

* Add Opensearch container to the CI

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:38:09 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests (#2453)
* use delete_index instead of delete_documents in tests

* fix delete_index

* fix  delete_index() in memory and milvus

* fix imports

* fix memory keyerrors

* Update Documentation & Code Style

* increase timeout for pinecone tests to 60 minutes

* clean get_document_store()

* use recreate_index in tests

* Update Documentation & Code Style

* fix tests

* fix remaining tests

* log index deleted

* fix test_eval_pipeline

* simplify existing index detection in weaviate

* delete label_index on recreate_index for pinecone and milvus

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Sara Zan
4eec2dc45e
Change YAML version exception into a warning (#2385)
* Change exception into warning, add strict_version param, and remove compatibility between schemas

* Simplify update_json_schema

* Rename unstable into master

* Prevent validate_config from changing the config to validate

* Fix version validation and add tests

* Rename master into ignore

* Complete parameter rename

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:08:08 +02:00
Sara Zan
8abf11fbd3
Update pdftotext also on pinecone and milvus1 CI jobs (#2433)
* Upgrade pdftotext also on pinecone and milvus1 jobs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:06:27 +02:00