19 Commits

Author SHA1 Message Date
Vladimir Blagojevic
86d56b4dfe
Add HF model caching for integration tests (#2909)
* Add HF model caching for integration tests

* Remove windows mode caching - not worth it
2022-07-29 18:17:05 +02:00
Massimiliano Pippi
e7627c3f8b
Use opensearch-py in OpenSearchDocumentStore (#2691)
* add Opensearch extras

* let OpenSearchDocumentStore use opensearch-py

* Update Documentation & Code Style

* fix a bug found after adding tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-28 10:04:49 +02:00
Zoltan Fedor
adb2b2c312
Add support for BM25 with the Weaviate document store (#2860)
* Upgrading Weaviate used for testing to 1.14.1 from 1.11.0

This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue.

* Weaviate client upgrade

From v3.3.3 to v3.6.0

* Adding BM25 Retrieval to Weaviate

Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters).
This commit adds support for inverted index (BM25) querying against Weaviate.

* Running Black on the recent code changes

* Update Documentation & Code Style

* Fixing linting issues after code changes by black

* The BM25 query needs to be in all lowercase for now

The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate.
See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119

* Fixing method parameter docstring to highlight that they are not supported in Weaviate

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-27 10:07:13 +02:00
Sara Zan
2d65c380f1
pre-commit hooks (#2819)
* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
2022-07-26 15:02:15 +02:00
Sara Zan
5119acb260
Raise timeout on integration tests (#2880) 2022-07-25 06:43:20 -04:00
Sara Zan
48644b23fb
Enable CI on tutorials (#2801)
* enable ci on tutorials

* Disable all path restrictions for safety

* actually comment out the paths block

* remove comment
2022-07-18 17:59:55 +02:00
Sara Zan
6b39fbd39c
Mocking Pinecone tests (#2778)
* Integrating the mock into conftest.py

* re-enable workflow

* delete_all

* Update Documentation & Code Style

* remove ValueError

* Add empty response

* wrong condition

* return response

* revert removal of delete_all

* change mock

* Update Documentation & Code Style

* test for rest api, to revert

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-14 20:03:33 +02:00
Massimiliano Pippi
82df677ebf
API tests (#2738)
* clean up tests and run earlier

* use change detection

* better naming, skip ES

* more cleanup

* fix job name

* dummy commit to trigger the CI

* mock away the PDF converter

* make the test compatible with 3.7

* removed leftover

* always run the api tests, use a matrix for the OS

* refactor all the tests

* remove outdated dependency

* pylint

* new abstract method

* adjust for older python versions

* rename pipeline file

* address PR comments
2022-07-14 15:36:28 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA (#2605)
* first draft of openai node for QA

* Update Documentation & Code Style

* fix mypy. add node to inits

* Update Documentation & Code Style

* fix linter

* Adapt OpenAIGenerator to completions endpoint

* Update Documentation & Code Style

* Fix pylint

* Fix doc strings

* Make use of temperature

* Make use of api key in tests

* Adapt doc strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
tstadel
2a7c0139f5
double max heap size for elasticsearch in CI (#2756) 2022-07-05 13:53:32 +02:00
Julian Risch
1781e88802
Upgrade torch to 1.12 (#2741)
* Upgrade torch to 1.12

* upgrade torch-scatter

* add explicit torch-scatter installation

* set torch dependency to range >1.9,<1.13
2022-07-01 20:23:32 +02:00
Sara Zan
400d2cdf77
Fix audio tests on CI (#2718)
* Update Documentation & Code Style

* fix huggingface-hub version

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-24 11:36:31 +02:00
Sara Zan
505ababf43
Skip Pinecone tests (#2696)
* comment out Pinecone tests block

* Add comment
2022-06-21 14:49:36 +02:00
Sara Zan
584e046642
AnswerToSpeech (#2584)
* Add new audio answer primitives

* Add AnswerToSpeech

* Add dependency group

* Update Documentation & Code Style

* Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives

* Add tests

* Update Documentation & Code Style

* Add ability to compress audio and more tests

* Add audio group to test, all and all-gpu

* fix pylint

* Update Documentation & Code Style

* Accidental git tag

* Try pleasing mypy

* Update Documentation & Code Style

* fix pylint

* Add warning for missing OS library and support in CI

* Try fixing mypy

* Update Documentation & Code Style

* Add docs, simplify args for audio nodes and add tutorials

* Fix mypy

* Fix run_batch

* Feedback on tutorials

* fix mypy and pylint

* Fix mypy again

* Fix mypy yet again

* Fix the ci

* Fix dicts merge and install ffmpeg on CI

* Make the audio nodes import safe

* Trying to increase tolerance in audio test

* Fix import paths

* fix linter

* Update Documentation & Code Style

* Add audio libs in unit tests

* Update _text_to_speech.py

* Update answer_to_speech.py

* Use dedicated dataset & update telemetry

* Remove  and use distilled roberta

* Revert special primitives so that the nodes run in indexing

* Improve tutorials and fix smaller bugs

* Update Documentation & Code Style

* Fix serialization issue

* Update Documentation & Code Style

* Improve tutorial

* Update Documentation & Code Style

* Update _text_to_speech.py

* Minor lg updates

* Minor lg updates to tutorial

* Making indexing work in tutorials

* Update Documentation & Code Style

* Improve docstrings

* Try to use GPU when available

* Update Documentation & Code Style

* Fixi mypy and pylint

* Try to pass the device correctly

* Update Documentation & Code Style

* Use type of device

* use .cpu()

* Improve .ipynb

* update apt index to be able to download libsndfile1

* Fix SpeechDocument.from_dict()

* Change pip URL

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-06-15 10:13:18 +02:00
Sara Zan
735ffa635b
[CI refactoring] Tutorials on CI (#2547)
* Experimental Ci workflow for running tutorials

* Run on every push for now

* Not starting?

* Disabling paths temporarily

* Sort tutorials in natural order

* Install ipython

* remove ipython install

* Try running ipython with sudo

* env.pythonLocation

* Skipping tutorial2 and 9 for speed

* typo

* Use one runner per tutorial, for now

* Typo in dependend job

* Missing quotes broke scripts matrix

* Simplify setup for the tutorials, try to prevent containers conflict

* Remove needless job dependencies

* Try prevent cache issues, fix small Tut10 bug

* Missing deps for running notebook tutorials

* Create three groups of tutorials excluding the longest among them

* remove deps

* use proper bash loop

* Try with a single string

* Fix typo in echo

* Forgot do

* Typo

* Try to make the GraphDB tutorial without launching its own container

* Run notebook and script together

* Whitespace

* separate scrpits and notebooks execution

* Run notebooks first

* Try caching the GoT data before running the scripts

* add note

* fix mkdir

* Fix path

* Update Documentation & Code Style

* missing -r

* Fix folder numbering

* Run notebooks as well

* Typo in notebook command

* complete path in notebook command

* Try with TIKA_LOG_PATH

* Fix folder naming

* Do not use cached data in Tut9

* extracting the number better

* Small tweaks

* Same fix on Tut10 on the notebook

* Exclude GoT cache for tut5 too

* Remove faiss files after tutorial run

* Layout

* fix remove command

* Fix path in tut10 notebook

* Fix typo in node name in tut14

* Third block was too long, rebancing

* Reduce GoT dataset even more, why wasting time after all...

* Fix paths in tut10 again

* do git clean to make sure to cleanup everything (breaks post Python)

* Remove ES file with bad permission at the end of the run

* Split first block, takes >30mins

* take out tut15 for a moment, has an actual bug

* typo

* Forgot rm option

* Simply remove all ES files

* Improve logs of GoT reduction

* Exclude also tut16 from cache to try fix bug

* Replace ll with ls

* Reintroduce 15_TableQA

* Small regrouping

* regrouping to make the min num of runners go for about 30mins

* Add cron schedule and PR paths conditions

* Add some timing information

* Separate tutorials by diff and tutorials by cron

* temp add pull_request to tutorials nightly

* Add badge in README to keep track of the nightly tutorials run

* Remove prefixes from data folder names

* Add fetch depth to get diff with master

* Fix paths again

* typo

* Exclude long-running ones

* Typo

* Fix tutorials.yml as well

* Use head_ref

* Using an action for now

* exclude other files

* Use only the correct command to run the tutorial

* Add long running tutorials in separate runners, just for experiment

* Factor out the complex bash script

* Pass the python path to the bash script

* Fix paths

* adding log statement

* Missing dollarsign

* Resetting variable in loop

* using mini GoT dataset and improving bash script

* change dataset name

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 09:53:36 +02:00
Sara Zan
8d7439c623
Move autoformat-check.yml into tests.yml (#2635) 2022-06-10 18:22:16 +02:00
Sara Zan
9968c373d2
make 'ready for review' an event that triggers the tests (#2643) 2022-06-09 09:23:38 +02:00
Sara Zan
c2d2faf31e
Add directive in tests.yml (#2637) 2022-06-07 13:31:19 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring (#2576)
* Unify CI tests (from #2466)

* Update Documentation & Code Style

* Change folder names

* Fix markers list

* Remove marker 'slow', replaced with 'integration'

* Soften children check

* Start ES first so it has time to boot while Python is setup

* Run the full workflow

* Try to make pip upgrade on Windows

* Set KG tests as integration

* Update Documentation & Code Style

* typo

* faster pylint

* Make Pylint use the cache

* filter diff files for pylint

* debug pylint statement

* revert pylint changes

* Remove path from asserted log (fails on Windows)

* Skip preprocessor test on Windows

* Tackling Windows specific failures

* Fix pytest command for windows suites

* Remove \ from command

* Move poppler test into integration

* Skip opensearch test on windows

* Add tolerance in reader sas score for Windows

* Another pytorch approx

* Raise time limit for unit tests :(

* Skip poppler test on Windows CI

* Specify to pull with FF only in docs check

* temporarily run the docs check immediately

* Allow merge commit for now

* Try without fetch depth

* Accelerating test

* Accelerating test

* Add repository and ref alongside fetch-depth

* Separate out code&docs check from tests

* Use setup-python cache

* Delete custom action

* Remove the pull step in the docs check, will find a way to run on bot commits

* Add requirements.txt in .github for caching

* Actually install dependencies

* Change deps group for pylint

* Unclear why the requirements.txt is still required :/

* Fix the code check python setup

* Install all deps for pylint

* Make the autoformat check depend on tests and doc updates workflows

* Try installing dependencies in another order

* Try again to install the deps

* quoting the paths

* Ad back the requirements

* Try again to install rest_api and ui

* Change deps group

* Duplicate haystack install line

* See if the cache is the problem

* Disable also in mypy, who knows

* split the install step

* Split install step everywhere

* Revert "Separate out code&docs check from tests"

This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.

* Add back the action

* Proactive support for audio (see text2speech branch)

* Fix label generator tests

* Remove install of libsndfile1 on win temporarily

* exclude audio tests on win

* install ffmpeg for integration tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00