haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-06 03:57:19 +00:00

Author	SHA1	Message	Date
Julian Risch	b685409c78	chore: add topic tags to auto generation of release notes (#3008 )	2022-08-09 17:12:42 +02:00
Vladimir Blagojevic	50f7d660e2	Add slack hook for test failures (#2996 )	2022-08-09 08:27:52 -04:00
Massimiliano Pippi	0e8efdafa9	Add enhanced pydoc-markdown pre-hook (#2979 ) * add pydoc-markdown pre-hook * add more comments, remove debug prints	2022-08-08 12:41:21 +02:00
Tobias Wochinger	065173fe5e	chore: add PR template (#2883 ) * chore: add PR template * ci: update PR template after latest discussions in Notion * Apply suggestions from code review Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * Apply suggestions from code review Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * Update .github/pull_request_template.md Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> * docs: re-order and add link * docs: add new conventions to contributor guidelines Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-08-05 18:14:18 +02:00
Massimiliano Pippi	40d07c2038	Enable Opensearch unit tests in Windows CI (#2936 ) * enable Opensearch unit tests under Win * move unit tests into a dedicated job * skip audio tests on missing dependencies * avoid failing test collection when soundfile is not available * Update .github/workflows/tests.yml Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-03 19:19:07 +02:00
Sara Zan	669f6f0128	Add git diff to schema checks (#2959 )	2022-08-03 09:46:38 -04:00
Massimiliano Pippi	e766bb8684	add code owners (#2950 ) * add code owners * add tutorials folder	2022-08-03 10:48:30 +02:00
Vladimir Blagojevic	86d56b4dfe	Add HF model caching for integration tests (#2909 ) * Add HF model caching for integration tests * Remove windows mode caching - not worth it	2022-07-29 18:17:05 +02:00
Sara Zan	434b1c3682	Disable a few checks in the pre-commit hook (#2929 ) * Disable small checks giving trouble to pydoc-markdown and JSON Schema * Add instructions for JSON schema generator in the workflow logs	2022-07-29 17:02:56 +02:00
Massimiliano Pippi	e7627c3f8b	Use opensearch-py in OpenSearchDocumentStore (#2691 ) * add Opensearch extras * let OpenSearchDocumentStore use opensearch-py * Update Documentation & Code Style * fix a bug found after adding tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-07-28 10:04:49 +02:00
Zoltan Fedor	adb2b2c312	Add support for BM25 with the Weaviate document store (#2860 ) * Upgrading Weaviate used for testing to 1.14.1 from 1.11.0 This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue. * Weaviate client upgrade From v3.3.3 to v3.6.0 * Adding BM25 Retrieval to Weaviate Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters). This commit adds support for inverted index (BM25) querying against Weaviate. * Running Black on the recent code changes * Update Documentation & Code Style * Fixing linting issues after code changes by black * The BM25 query needs to be in all lowercase for now The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate. See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119 * Fixing method parameter docstring to highlight that they are not supported in Weaviate * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-27 10:07:13 +02:00
Sara Zan	2d65c380f1	pre-commit hooks (#2819 ) * Add pre-commit config * update contributing guidelines * try failing the workflow * add pre-commit to the deps * updating uninstall instructions * separate jobs in CI * make tutorials check fail * make black check fail * make openapi check fail * make yaml schema and api docs checks fail * highlight the instructions * Update .pre-commit-config.yaml Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Update CONTRIBUTING.md Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Update CONTRIBUTING.md Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Use black --check * Add images of the CI * title level * feedback Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>	2022-07-26 15:02:15 +02:00
Sara Zan	5d8476eb58	Restart containers in `tutorials.sh` (#2858 ) * restart tutorials in the loop * remove container steps in tutorials.yml * forgotten quotes * unmatched bracket * give names to containers * try to limit the log size * make the containers restart on the scripts as well * feedback * Raise integration tests timeout * raising limit again	2022-07-25 17:35:36 +02:00
Sara Zan	5119acb260	Raise timeout on integration tests (#2880 )	2022-07-25 06:43:20 -04:00
Massimiliano Pippi	8ee2b6b403	Add a custom pydoc renderer for Readme.io (#2825 ) * add custom pydoc renderer * create an example * revert example code	2022-07-22 10:43:51 +02:00
Sara Zan	48644b23fb	Enable CI on tutorials (#2801 ) * enable ci on tutorials * Disable all path restrictions for safety * actually comment out the paths block * remove comment	2022-07-18 17:59:55 +02:00
Sara Zan	6b39fbd39c	Mocking Pinecone tests (#2778 ) * Integrating the mock into conftest.py * re-enable workflow * delete_all * Update Documentation & Code Style * remove ValueError * Add empty response * wrong condition * return response * revert removal of delete_all * change mock * Update Documentation & Code Style * test for rest api, to revert Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-14 20:03:33 +02:00
Massimiliano Pippi	82df677ebf	API tests (#2738 ) * clean up tests and run earlier * use change detection * better naming, skip ES * more cleanup * fix job name * dummy commit to trigger the CI * mock away the PDF converter * make the test compatible with 3.7 * removed leftover * always run the api tests, use a matrix for the OS * refactor all the tests * remove outdated dependency * pylint * new abstract method * adjust for older python versions * rename pipeline file * address PR comments	2022-07-14 15:36:28 +02:00
Sara Zan	091711b8c4	Fix `Tutorials` and `Tutorials (nightly)` (#2737 ) * Remove caching and install audio deps * Fix `Tutorials` as well * Run all tutorials even though some fail * Forgot fi * fix failure condition * proper bash string equality * Enable debug logs * remove audio files * Update Documentation & Code Style * Use the setup action in the Tutorial CI as well * Try with a file that exists * Update Documentation & Code Style * Fix the comments in the tutorials * Update Documentation & Code Style * Fix tutorials.sh * Remove debug logging * import pprint and try editable install * Update Documentation & Code Style * extract no run list * Add tutorial18 to no run list nightly * import pprint correctly * Update Documentation & Code Style * try making site-packages editable * Make pythonpath editable every time Tut17 is run on CI * typo * fix imports in tut5 * add git clean * Update Documentation & Code Style * add comments and remove` -e` * accidentally deleted a line * Update .github/utils/tutorials.sh Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-07-12 11:22:17 +02:00
Malte Pietsch	ba08fc86f5	Add node to use OpenAI's GPT-3 for QA (#2605 ) * first draft of openai node for QA * Update Documentation & Code Style * fix mypy. add node to inits * Update Documentation & Code Style * fix linter * Adapt OpenAIGenerator to completions endpoint * Update Documentation & Code Style * Fix pylint * Fix doc strings * Make use of temperature * Make use of api key in tests * Adapt doc strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-07-08 13:59:27 +02:00
tstadel	2a7c0139f5	double max heap size for elasticsearch in CI (#2756 )	2022-07-05 13:53:32 +02:00
Vladimir Blagojevic	ffb7e4e4bd	GPL tutorial - add GPU header and open in colab button (#2736 ) * GPL tutorial - add GPU header and open in colab button * Add GPL tutorial to run exclusion list	2022-07-04 05:23:39 -04:00
Julian Risch	1781e88802	Upgrade torch to 1.12 (#2741 ) * Upgrade torch to 1.12 * upgrade torch-scatter * add explicit torch-scatter installation * set torch dependency to range >1.9,<1.13	2022-07-01 20:23:32 +02:00
Vladimir Blagojevic	b08c5f81d1	Add GPL adaptation tutorial (#2632 ) * Add GPL adaptation tutorial * Latest round of Aga's corrections * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-26 02:44:57 -04:00
Sara Zan	426f49979b	Change `repo` with `repository` in `python_cache` (#2731 ) * Change repo with repository * remove name * using owner and name * use owner name * replace name with login * Trying with the PR context instead	2022-06-24 18:36:19 +02:00
Sara Zan	6a7152044e	add repo name as well (#2729 )	2022-06-24 17:08:28 +02:00
Sara Zan	13514f960d	Speficy ref in action (#2727 )	2022-06-24 15:56:17 +02:00
Sara Zan	400d2cdf77	Fix audio tests on CI (#2718 ) * Update Documentation & Code Style * fix huggingface-hub version Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-24 11:36:31 +02:00
Rob Pasternak	b87c0c950b	Tutorial 14 edit (#2663 ) * Rewrite Tutorial 14 for increased user-friendliness * Update Tutorial14 .py file to match .ipynb file * Update Documentation & Code Style * unblock the ci * ignore error in jitterbit/get-changed-files Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-06-22 13:03:07 +02:00
Sara Zan	505ababf43	Skip Pinecone tests (#2696 ) * comment out Pinecone tests block * Add comment	2022-06-21 14:49:36 +02:00
Massimiliano Pippi	5d255f0e4a	replace question issue with link to discussions (#2697 )	2022-06-21 14:10:11 +02:00
Sara Zan	a6c06ee376	Update contributor's checklists in PR template (#2659 ) * Split contributor's and reviewer's checklists * contributor-centric checklist * Move issues at the top and split entry * phrasing	2022-06-21 10:11:18 +02:00
Sara Zan	a26c042994	Fix typo in `code_and_docs.sh` (#2662 ) * Fix typo in code_and_docs.sh & install ffmpeg in autoformat.yml * apt update to get ffmpeg * Update Documentation & Code Style * Add header and better error message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-15 13:50:55 +02:00
Sara Zan	776eba0cd1	Remove `pull_request` from triggers (#2661 )	2022-06-15 10:14:22 +02:00
Sara Zan	584e046642	`AnswerToSpeech` (#2584 ) * Add new audio answer primitives * Add AnswerToSpeech * Add dependency group * Update Documentation & Code Style * Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives * Add tests * Update Documentation & Code Style * Add ability to compress audio and more tests * Add audio group to test, all and all-gpu * fix pylint * Update Documentation & Code Style * Accidental git tag * Try pleasing mypy * Update Documentation & Code Style * fix pylint * Add warning for missing OS library and support in CI * Try fixing mypy * Update Documentation & Code Style * Add docs, simplify args for audio nodes and add tutorials * Fix mypy * Fix run_batch * Feedback on tutorials * fix mypy and pylint * Fix mypy again * Fix mypy yet again * Fix the ci * Fix dicts merge and install ffmpeg on CI * Make the audio nodes import safe * Trying to increase tolerance in audio test * Fix import paths * fix linter * Update Documentation & Code Style * Add audio libs in unit tests * Update _text_to_speech.py * Update answer_to_speech.py * Use dedicated dataset & update telemetry * Remove and use distilled roberta * Revert special primitives so that the nodes run in indexing * Improve tutorials and fix smaller bugs * Update Documentation & Code Style * Fix serialization issue * Update Documentation & Code Style * Improve tutorial * Update Documentation & Code Style * Update _text_to_speech.py * Minor lg updates * Minor lg updates to tutorial * Making indexing work in tutorials * Update Documentation & Code Style * Improve docstrings * Try to use GPU when available * Update Documentation & Code Style * Fixi mypy and pylint * Try to pass the device correctly * Update Documentation & Code Style * Use type of device * use .cpu() * Improve .ipynb * update apt index to be able to download libsndfile1 * Fix SpeechDocument.from_dict() * Change pip URL Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-06-15 10:13:18 +02:00
Sara Zan	735ffa635b	[CI refactoring] Tutorials on CI (#2547 ) * Experimental Ci workflow for running tutorials * Run on every push for now * Not starting? * Disabling paths temporarily * Sort tutorials in natural order * Install ipython * remove ipython install * Try running ipython with sudo * env.pythonLocation * Skipping tutorial2 and 9 for speed * typo * Use one runner per tutorial, for now * Typo in dependend job * Missing quotes broke scripts matrix * Simplify setup for the tutorials, try to prevent containers conflict * Remove needless job dependencies * Try prevent cache issues, fix small Tut10 bug * Missing deps for running notebook tutorials * Create three groups of tutorials excluding the longest among them * remove deps * use proper bash loop * Try with a single string * Fix typo in echo * Forgot do * Typo * Try to make the GraphDB tutorial without launching its own container * Run notebook and script together * Whitespace * separate scrpits and notebooks execution * Run notebooks first * Try caching the GoT data before running the scripts * add note * fix mkdir * Fix path * Update Documentation & Code Style * missing -r * Fix folder numbering * Run notebooks as well * Typo in notebook command * complete path in notebook command * Try with TIKA_LOG_PATH * Fix folder naming * Do not use cached data in Tut9 * extracting the number better * Small tweaks * Same fix on Tut10 on the notebook * Exclude GoT cache for tut5 too * Remove faiss files after tutorial run * Layout * fix remove command * Fix path in tut10 notebook * Fix typo in node name in tut14 * Third block was too long, rebancing * Reduce GoT dataset even more, why wasting time after all... * Fix paths in tut10 again * do git clean to make sure to cleanup everything (breaks post Python) * Remove ES file with bad permission at the end of the run * Split first block, takes >30mins * take out tut15 for a moment, has an actual bug * typo * Forgot rm option * Simply remove all ES files * Improve logs of GoT reduction * Exclude also tut16 from cache to try fix bug * Replace ll with ls * Reintroduce 15_TableQA * Small regrouping * regrouping to make the min num of runners go for about 30mins * Add cron schedule and PR paths conditions * Add some timing information * Separate tutorials by diff and tutorials by cron * temp add pull_request to tutorials nightly * Add badge in README to keep track of the nightly tutorials run * Remove prefixes from data folder names * Add fetch depth to get diff with master * Fix paths again * typo * Exclude long-running ones * Typo * Fix tutorials.yml as well * Use head_ref * Using an action for now * exclude other files * Use only the correct command to run the tutorial * Add long running tutorials in separate runners, just for experiment * Factor out the complex bash script * Pass the python path to the bash script * Fix paths * adding log statement * Missing dollarsign * Resetting variable in loop * using mini GoT dataset and improving bash script * change dataset name Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-15 09:53:36 +02:00
Sara Zan	8d7439c623	Move autoformat-check.yml into tests.yml (#2635 )	2022-06-10 18:22:16 +02:00
Sara Zan	9968c373d2	make 'ready for review' an event that triggers the tests (#2643 )	2022-06-09 09:23:38 +02:00
Sara Zan	c2d2faf31e	Add directive in `tests.yml` (#2637 )	2022-06-07 13:31:19 +02:00
Sara Zan	59608ca474	[CI Refactoring] Workflow refactoring (#2576 ) * Unify CI tests (from #2466) * Update Documentation & Code Style * Change folder names * Fix markers list * Remove marker 'slow', replaced with 'integration' * Soften children check * Start ES first so it has time to boot while Python is setup * Run the full workflow * Try to make pip upgrade on Windows * Set KG tests as integration * Update Documentation & Code Style * typo * faster pylint * Make Pylint use the cache * filter diff files for pylint * debug pylint statement * revert pylint changes * Remove path from asserted log (fails on Windows) * Skip preprocessor test on Windows * Tackling Windows specific failures * Fix pytest command for windows suites * Remove \ from command * Move poppler test into integration * Skip opensearch test on windows * Add tolerance in reader sas score for Windows * Another pytorch approx * Raise time limit for unit tests :( * Skip poppler test on Windows CI * Specify to pull with FF only in docs check * temporarily run the docs check immediately * Allow merge commit for now * Try without fetch depth * Accelerating test * Accelerating test * Add repository and ref alongside fetch-depth * Separate out code&docs check from tests * Use setup-python cache * Delete custom action * Remove the pull step in the docs check, will find a way to run on bot commits * Add requirements.txt in .github for caching * Actually install dependencies * Change deps group for pylint * Unclear why the requirements.txt is still required :/ * Fix the code check python setup * Install all deps for pylint * Make the autoformat check depend on tests and doc updates workflows * Try installing dependencies in another order * Try again to install the deps * quoting the paths * Ad back the requirements * Try again to install rest_api and ui * Change deps group * Duplicate haystack install line * See if the cache is the problem * Disable also in mypy, who knows * split the install step * Split install step everywhere * Revert "Separate out code&docs check from tests" This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd. * Add back the action * Proactive support for audio (see text2speech branch) * Fix label generator tests * Remove install of libsndfile1 on win temporarily * exclude audio tests on win * install ffmpeg for integration tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-07 09:23:03 +02:00
Sara Zan	89bb1ca139	[CI refactoring] Improve `autoformat.yml` (#2556 ) * Restructure autoformat to run a single script * Reduce diff for autoforma.yml * Reduce diff on linux_ci.yml	2022-05-18 20:02:43 +02:00
Julian Risch	70ca1e9fc6	Smaller demo instance type (#2564 ) This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge. g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory which results in 75% lower costs with g4dn.2xlarge. I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.	2022-05-17 12:47:15 +02:00
Sara Zan	ff4303c51b	[CI refactoring] Categorize tests into folders (#2554 ) * Categorize tests into folders * Fix linux_ci.yml and an import * Wrong path	2022-05-17 09:55:53 +01:00
Ivan Lopez	a2a99f79b1	Fix docker image tag with semantic version for releases (#2548 ) * Fix docker tag with semantic version for releases * Prepend latest docker tag with tagprefix in cache-from	2022-05-16 13:26:33 +02:00
bogdankostic	300ee1ac83	Upgrade torch version to 1.11 (#2538 ) * Bump torch version * Upgrade torch version in torch-scatter	2022-05-13 14:45:53 +02:00
Sara Zan	15a9ff6f67	PR template mention of enabling Actions (#2523 ) * Update version to 1.4.1rc0 * Add hint of enabling action on the fork in the PR template * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-10 09:46:09 +02:00
Sara Zan	f3e0ba4be9	Fix `OpenSearchDocumentStore`'s `__init__` (#2498 ) * Move super in OpenSearchDocumentStore and add small test * Update Documentation & Code Style * Add Opensearch container to the CI Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-05 10:38:09 +02:00
tstadel	7498c7c6fb	Fix and use delete_index instead of delete_documents in tests (#2453 ) * use delete_index instead of delete_documents in tests * fix delete_index * fix delete_index() in memory and milvus * fix imports * fix memory keyerrors * Update Documentation & Code Style * increase timeout for pinecone tests to 60 minutes * clean get_document_store() * use recreate_index in tests * Update Documentation & Code Style * fix tests * fix remaining tests * log index deleted * fix test_eval_pipeline * simplify existing index detection in weaviate * delete label_index on recreate_index for pinecone and milvus * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-26 19:06:30 +02:00
Sara Zan	4eec2dc45e	Change YAML version exception into a warning (#2385 ) * Change exception into warning, add strict_version param, and remove compatibility between schemas * Simplify update_json_schema * Rename unstable into master * Prevent validate_config from changing the config to validate * Fix version validation and add tests * Rename master into ignore * Complete parameter rename Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 16:08:08 +02:00
Sara Zan	8abf11fbd3	Update `pdftotext` also on `pinecone` and `milvus1` CI jobs (#2433 ) * Upgrade pdftotext also on pinecone and milvus1 jobs * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 16:06:27 +02:00

1 2 3 4

153 Commits