haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-31 17:17:31 +00:00

Author	SHA1	Message	Date
Vladimir Blagojevic	50f7d660e2	Add slack hook for test failures (#2996 )	2022-08-09 08:27:52 -04:00
Massimiliano Pippi	40d07c2038	Enable Opensearch unit tests in Windows CI (#2936 ) * enable Opensearch unit tests under Win * move unit tests into a dedicated job * skip audio tests on missing dependencies * avoid failing test collection when soundfile is not available * Update .github/workflows/tests.yml Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-08-03 19:19:07 +02:00
Sara Zan	669f6f0128	Add git diff to schema checks (#2959 )	2022-08-03 09:46:38 -04:00
Vladimir Blagojevic	86d56b4dfe	Add HF model caching for integration tests (#2909 ) * Add HF model caching for integration tests * Remove windows mode caching - not worth it	2022-07-29 18:17:05 +02:00
Sara Zan	434b1c3682	Disable a few checks in the pre-commit hook (#2929 ) * Disable small checks giving trouble to pydoc-markdown and JSON Schema * Add instructions for JSON schema generator in the workflow logs	2022-07-29 17:02:56 +02:00
Massimiliano Pippi	e7627c3f8b	Use opensearch-py in OpenSearchDocumentStore (#2691 ) * add Opensearch extras * let OpenSearchDocumentStore use opensearch-py * Update Documentation & Code Style * fix a bug found after adding tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-07-28 10:04:49 +02:00
Zoltan Fedor	adb2b2c312	Add support for BM25 with the Weaviate document store (#2860 ) * Upgrading Weaviate used for testing to 1.14.1 from 1.11.0 This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue. * Weaviate client upgrade From v3.3.3 to v3.6.0 * Adding BM25 Retrieval to Weaviate Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters). This commit adds support for inverted index (BM25) querying against Weaviate. * Running Black on the recent code changes * Update Documentation & Code Style * Fixing linting issues after code changes by black * The BM25 query needs to be in all lowercase for now The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate. See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119 * Fixing method parameter docstring to highlight that they are not supported in Weaviate * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-27 10:07:13 +02:00
Sara Zan	2d65c380f1	pre-commit hooks (#2819 ) * Add pre-commit config * update contributing guidelines * try failing the workflow * add pre-commit to the deps * updating uninstall instructions * separate jobs in CI * make tutorials check fail * make black check fail * make openapi check fail * make yaml schema and api docs checks fail * highlight the instructions * Update .pre-commit-config.yaml Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Update CONTRIBUTING.md Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Update CONTRIBUTING.md Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de> * Use black --check * Add images of the CI * title level * feedback Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>	2022-07-26 15:02:15 +02:00
Sara Zan	5d8476eb58	Restart containers in `tutorials.sh` (#2858 ) * restart tutorials in the loop * remove container steps in tutorials.yml * forgotten quotes * unmatched bracket * give names to containers * try to limit the log size * make the containers restart on the scripts as well * feedback * Raise integration tests timeout * raising limit again	2022-07-25 17:35:36 +02:00
Sara Zan	5119acb260	Raise timeout on integration tests (#2880 )	2022-07-25 06:43:20 -04:00
Sara Zan	48644b23fb	Enable CI on tutorials (#2801 ) * enable ci on tutorials * Disable all path restrictions for safety * actually comment out the paths block * remove comment	2022-07-18 17:59:55 +02:00
Sara Zan	6b39fbd39c	Mocking Pinecone tests (#2778 ) * Integrating the mock into conftest.py * re-enable workflow * delete_all * Update Documentation & Code Style * remove ValueError * Add empty response * wrong condition * return response * revert removal of delete_all * change mock * Update Documentation & Code Style * test for rest api, to revert Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-07-14 20:03:33 +02:00
Massimiliano Pippi	82df677ebf	API tests (#2738 ) * clean up tests and run earlier * use change detection * better naming, skip ES * more cleanup * fix job name * dummy commit to trigger the CI * mock away the PDF converter * make the test compatible with 3.7 * removed leftover * always run the api tests, use a matrix for the OS * refactor all the tests * remove outdated dependency * pylint * new abstract method * adjust for older python versions * rename pipeline file * address PR comments	2022-07-14 15:36:28 +02:00
Sara Zan	091711b8c4	Fix `Tutorials` and `Tutorials (nightly)` (#2737 ) * Remove caching and install audio deps * Fix `Tutorials` as well * Run all tutorials even though some fail * Forgot fi * fix failure condition * proper bash string equality * Enable debug logs * remove audio files * Update Documentation & Code Style * Use the setup action in the Tutorial CI as well * Try with a file that exists * Update Documentation & Code Style * Fix the comments in the tutorials * Update Documentation & Code Style * Fix tutorials.sh * Remove debug logging * import pprint and try editable install * Update Documentation & Code Style * extract no run list * Add tutorial18 to no run list nightly * import pprint correctly * Update Documentation & Code Style * try making site-packages editable * Make pythonpath editable every time Tut17 is run on CI * typo * fix imports in tut5 * add git clean * Update Documentation & Code Style * add comments and remove` -e` * accidentally deleted a line * Update .github/utils/tutorials.sh Co-authored-by: Massimiliano Pippi <mpippi@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-07-12 11:22:17 +02:00
Malte Pietsch	ba08fc86f5	Add node to use OpenAI's GPT-3 for QA (#2605 ) * first draft of openai node for QA * Update Documentation & Code Style * fix mypy. add node to inits * Update Documentation & Code Style * fix linter * Adapt OpenAIGenerator to completions endpoint * Update Documentation & Code Style * Fix pylint * Fix doc strings * Make use of temperature * Make use of api key in tests * Adapt doc strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <sarazanzo94@gmail.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-07-08 13:59:27 +02:00
tstadel	2a7c0139f5	double max heap size for elasticsearch in CI (#2756 )	2022-07-05 13:53:32 +02:00
Vladimir Blagojevic	ffb7e4e4bd	GPL tutorial - add GPU header and open in colab button (#2736 ) * GPL tutorial - add GPU header and open in colab button * Add GPL tutorial to run exclusion list	2022-07-04 05:23:39 -04:00
Julian Risch	1781e88802	Upgrade torch to 1.12 (#2741 ) * Upgrade torch to 1.12 * upgrade torch-scatter * add explicit torch-scatter installation * set torch dependency to range >1.9,<1.13	2022-07-01 20:23:32 +02:00
Sara Zan	400d2cdf77	Fix audio tests on CI (#2718 ) * Update Documentation & Code Style * fix huggingface-hub version Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-24 11:36:31 +02:00
Rob Pasternak	b87c0c950b	Tutorial 14 edit (#2663 ) * Rewrite Tutorial 14 for increased user-friendliness * Update Tutorial14 .py file to match .ipynb file * Update Documentation & Code Style * unblock the ci * ignore error in jitterbit/get-changed-files Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-06-22 13:03:07 +02:00
Sara Zan	505ababf43	Skip Pinecone tests (#2696 ) * comment out Pinecone tests block * Add comment	2022-06-21 14:49:36 +02:00
Sara Zan	a26c042994	Fix typo in `code_and_docs.sh` (#2662 ) * Fix typo in code_and_docs.sh & install ffmpeg in autoformat.yml * apt update to get ffmpeg * Update Documentation & Code Style * Add header and better error message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-15 13:50:55 +02:00
Sara Zan	776eba0cd1	Remove `pull_request` from triggers (#2661 )	2022-06-15 10:14:22 +02:00
Sara Zan	584e046642	`AnswerToSpeech` (#2584 ) * Add new audio answer primitives * Add AnswerToSpeech * Add dependency group * Update Documentation & Code Style * Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives * Add tests * Update Documentation & Code Style * Add ability to compress audio and more tests * Add audio group to test, all and all-gpu * fix pylint * Update Documentation & Code Style * Accidental git tag * Try pleasing mypy * Update Documentation & Code Style * fix pylint * Add warning for missing OS library and support in CI * Try fixing mypy * Update Documentation & Code Style * Add docs, simplify args for audio nodes and add tutorials * Fix mypy * Fix run_batch * Feedback on tutorials * fix mypy and pylint * Fix mypy again * Fix mypy yet again * Fix the ci * Fix dicts merge and install ffmpeg on CI * Make the audio nodes import safe * Trying to increase tolerance in audio test * Fix import paths * fix linter * Update Documentation & Code Style * Add audio libs in unit tests * Update _text_to_speech.py * Update answer_to_speech.py * Use dedicated dataset & update telemetry * Remove and use distilled roberta * Revert special primitives so that the nodes run in indexing * Improve tutorials and fix smaller bugs * Update Documentation & Code Style * Fix serialization issue * Update Documentation & Code Style * Improve tutorial * Update Documentation & Code Style * Update _text_to_speech.py * Minor lg updates * Minor lg updates to tutorial * Making indexing work in tutorials * Update Documentation & Code Style * Improve docstrings * Try to use GPU when available * Update Documentation & Code Style * Fixi mypy and pylint * Try to pass the device correctly * Update Documentation & Code Style * Use type of device * use .cpu() * Improve .ipynb * update apt index to be able to download libsndfile1 * Fix SpeechDocument.from_dict() * Change pip URL Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-06-15 10:13:18 +02:00
Sara Zan	735ffa635b	[CI refactoring] Tutorials on CI (#2547 ) * Experimental Ci workflow for running tutorials * Run on every push for now * Not starting? * Disabling paths temporarily * Sort tutorials in natural order * Install ipython * remove ipython install * Try running ipython with sudo * env.pythonLocation * Skipping tutorial2 and 9 for speed * typo * Use one runner per tutorial, for now * Typo in dependend job * Missing quotes broke scripts matrix * Simplify setup for the tutorials, try to prevent containers conflict * Remove needless job dependencies * Try prevent cache issues, fix small Tut10 bug * Missing deps for running notebook tutorials * Create three groups of tutorials excluding the longest among them * remove deps * use proper bash loop * Try with a single string * Fix typo in echo * Forgot do * Typo * Try to make the GraphDB tutorial without launching its own container * Run notebook and script together * Whitespace * separate scrpits and notebooks execution * Run notebooks first * Try caching the GoT data before running the scripts * add note * fix mkdir * Fix path * Update Documentation & Code Style * missing -r * Fix folder numbering * Run notebooks as well * Typo in notebook command * complete path in notebook command * Try with TIKA_LOG_PATH * Fix folder naming * Do not use cached data in Tut9 * extracting the number better * Small tweaks * Same fix on Tut10 on the notebook * Exclude GoT cache for tut5 too * Remove faiss files after tutorial run * Layout * fix remove command * Fix path in tut10 notebook * Fix typo in node name in tut14 * Third block was too long, rebancing * Reduce GoT dataset even more, why wasting time after all... * Fix paths in tut10 again * do git clean to make sure to cleanup everything (breaks post Python) * Remove ES file with bad permission at the end of the run * Split first block, takes >30mins * take out tut15 for a moment, has an actual bug * typo * Forgot rm option * Simply remove all ES files * Improve logs of GoT reduction * Exclude also tut16 from cache to try fix bug * Replace ll with ls * Reintroduce 15_TableQA * Small regrouping * regrouping to make the min num of runners go for about 30mins * Add cron schedule and PR paths conditions * Add some timing information * Separate tutorials by diff and tutorials by cron * temp add pull_request to tutorials nightly * Add badge in README to keep track of the nightly tutorials run * Remove prefixes from data folder names * Add fetch depth to get diff with master * Fix paths again * typo * Exclude long-running ones * Typo * Fix tutorials.yml as well * Use head_ref * Using an action for now * exclude other files * Use only the correct command to run the tutorial * Add long running tutorials in separate runners, just for experiment * Factor out the complex bash script * Pass the python path to the bash script * Fix paths * adding log statement * Missing dollarsign * Resetting variable in loop * using mini GoT dataset and improving bash script * change dataset name Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-15 09:53:36 +02:00
Sara Zan	8d7439c623	Move autoformat-check.yml into tests.yml (#2635 )	2022-06-10 18:22:16 +02:00
Sara Zan	9968c373d2	make 'ready for review' an event that triggers the tests (#2643 )	2022-06-09 09:23:38 +02:00
Sara Zan	c2d2faf31e	Add directive in `tests.yml` (#2637 )	2022-06-07 13:31:19 +02:00
Sara Zan	59608ca474	[CI Refactoring] Workflow refactoring (#2576 ) * Unify CI tests (from #2466) * Update Documentation & Code Style * Change folder names * Fix markers list * Remove marker 'slow', replaced with 'integration' * Soften children check * Start ES first so it has time to boot while Python is setup * Run the full workflow * Try to make pip upgrade on Windows * Set KG tests as integration * Update Documentation & Code Style * typo * faster pylint * Make Pylint use the cache * filter diff files for pylint * debug pylint statement * revert pylint changes * Remove path from asserted log (fails on Windows) * Skip preprocessor test on Windows * Tackling Windows specific failures * Fix pytest command for windows suites * Remove \ from command * Move poppler test into integration * Skip opensearch test on windows * Add tolerance in reader sas score for Windows * Another pytorch approx * Raise time limit for unit tests :( * Skip poppler test on Windows CI * Specify to pull with FF only in docs check * temporarily run the docs check immediately * Allow merge commit for now * Try without fetch depth * Accelerating test * Accelerating test * Add repository and ref alongside fetch-depth * Separate out code&docs check from tests * Use setup-python cache * Delete custom action * Remove the pull step in the docs check, will find a way to run on bot commits * Add requirements.txt in .github for caching * Actually install dependencies * Change deps group for pylint * Unclear why the requirements.txt is still required :/ * Fix the code check python setup * Install all deps for pylint * Make the autoformat check depend on tests and doc updates workflows * Try installing dependencies in another order * Try again to install the deps * quoting the paths * Ad back the requirements * Try again to install rest_api and ui * Change deps group * Duplicate haystack install line * See if the cache is the problem * Disable also in mypy, who knows * split the install step * Split install step everywhere * Revert "Separate out code&docs check from tests" This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd. * Add back the action * Proactive support for audio (see text2speech branch) * Fix label generator tests * Remove install of libsndfile1 on win temporarily * exclude audio tests on win * install ffmpeg for integration tests Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-06-07 09:23:03 +02:00
Sara Zan	89bb1ca139	[CI refactoring] Improve `autoformat.yml` (#2556 ) * Restructure autoformat to run a single script * Reduce diff for autoforma.yml * Reduce diff on linux_ci.yml	2022-05-18 20:02:43 +02:00
Julian Risch	70ca1e9fc6	Smaller demo instance type (#2564 ) This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge. g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory which results in 75% lower costs with g4dn.2xlarge. I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.	2022-05-17 12:47:15 +02:00
Sara Zan	ff4303c51b	[CI refactoring] Categorize tests into folders (#2554 ) * Categorize tests into folders * Fix linux_ci.yml and an import * Wrong path	2022-05-17 09:55:53 +01:00
Ivan Lopez	a2a99f79b1	Fix docker image tag with semantic version for releases (#2548 ) * Fix docker tag with semantic version for releases * Prepend latest docker tag with tagprefix in cache-from	2022-05-16 13:26:33 +02:00
bogdankostic	300ee1ac83	Upgrade torch version to 1.11 (#2538 ) * Bump torch version * Upgrade torch version in torch-scatter	2022-05-13 14:45:53 +02:00
Sara Zan	f3e0ba4be9	Fix `OpenSearchDocumentStore`'s `__init__` (#2498 ) * Move super in OpenSearchDocumentStore and add small test * Update Documentation & Code Style * Add Opensearch container to the CI Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-05-05 10:38:09 +02:00
tstadel	7498c7c6fb	Fix and use delete_index instead of delete_documents in tests (#2453 ) * use delete_index instead of delete_documents in tests * fix delete_index * fix delete_index() in memory and milvus * fix imports * fix memory keyerrors * Update Documentation & Code Style * increase timeout for pinecone tests to 60 minutes * clean get_document_store() * use recreate_index in tests * Update Documentation & Code Style * fix tests * fix remaining tests * log index deleted * fix test_eval_pipeline * simplify existing index detection in weaviate * delete label_index on recreate_index for pinecone and milvus * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-26 19:06:30 +02:00
Sara Zan	8abf11fbd3	Update `pdftotext` also on `pinecone` and `milvus1` CI jobs (#2433 ) * Upgrade pdftotext also on pinecone and milvus1 jobs * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 16:06:27 +02:00
Sara Zan	ba9c976bfe	Update `pdftotext` link (#2432 ) * Update pdftotext link * Update Documentation & Code Style * Update Tutorial8_Preprocessing.ipynb Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-04-19 14:30:18 +02:00
Sara Zan	1a81080e8a	Add `apt update` in Linux CI (#2415 ) * Update linux_ci.yml	2022-04-13 15:35:56 +02:00
tstadel	ab8ba75664	Set ci job timeout to 45 minutes (#2401 )	2022-04-11 16:28:26 +02:00
Sara Zan	ae712fe6bf	Upgrade `weaviate-client` to `3.3.3` and fix `get_all_documents` (#1895 ) * Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents * Add type * Update Weaviate version on the CI * Fix bug on get_document_count where there are no documents * Add more info in the docstrings of get_all_documents and get_all_documents_generator * Add latest docstring and tutorial changes * Apply Black * Update Documentation & Code Style * Trigger pipeline * Update Documentation & Code Style * Include StefanBogdan feedback * Fix mypy issues and LogicalFilterClause * Add more types * Update Documentation & Code Style * update setup.cfg * Upgrade weaviate containers too * Allow to filter for content field in Weaviate * Use convert_to_weaviate instead of convert_to_pinecone * Fix _get_all_documents_in_index * Update docstrings and docs * Catching an exception in get_document(s)_by_id Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-04-01 15:37:34 +03:00
bogdankostic	834f8c4902	Change return types of indexing pipeline nodes (#2342 ) * Change return types of file converters * Change return types of preprocessor * Change return types of crawler * Adapt utils to functions to new return types * Adapt __init__.py to new method names * Prevent circular imports * Update Documentation & Code Style * Let DocStores' run method accept Documents * Adapt tests to new return types * Update Documentation & Code Style * Put "# type: ignore" to right place * Remove id_hash_keys property from Document primitive * Update Documentation & Code Style * Adapt tests to new return types and missing id_hash_keys property * Fix mypy * Fix mypy * Adapt PDFToTextOCRConverter * Remove id_hash_keys from RestAPI tests * Update Documentation & Code Style * Rename tests * Remove redundant setting of content_type="text" * Add DeprecationWarning * Add id_hash_keys to elasticsearch_index_to_document_store * Change document type from dict to Docuemnt in PreProcessor test * Fix file path in Tutorial 5 * Remove added output in Tutorial 5 * Update Documentation & Code Style * Fix file_paths in Tutorial 9 + fix gz files in fetch_archive_from_http * Adapt tutorials to new return types * Adapt tutorial 14 to new return types * Update Documentation & Code Style * Change assertions to HaystackErrors * Import HaystackError correctly Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-29 13:53:35 +02:00
bogdankostic	7e6ff8a205	Run Pinecone tests only if files related to Pinecone changed (#2343 ) * Run Pinecone tests only if files related to Pinecone changed * Change in pinecone.py that will be reverted * Revert change in pinecone.py * Test Pinecone also when filter_utils.py changes	2022-03-22 15:58:12 +01:00
James Briggs	8cd73a9d20	Add `PineconeDocumentStore` (#2254 ) * added core install and functionality of pinecone doc store (init, upsert, query, delete) * implemented core functionality of Pinecone doc store * Update Documentation & Code Style * updated filtering to use Haystack filtering and reduced default batch_size * Update Documentation & Code Style * removed debugging code * updated Pinecone filtering to use filter_utils * removed uneeded methods and minor tweaks to current methods * fixed typing issues * Update Documentation & Code Style * Allow filters in al methods except get_embedding_count * Fix skipping document store tests * Update Documentation & Code Style * Fix handling of Milvus1 and Milvus2 in tests * Update Documentation & Code Style * Fix handling of Milvus1 and Milvus2 in tests * Update Documentation & Code Style * Remove SQL from tests requiring embeddings * Update Documentation & Code Style * Fix get_embedding_count of Milvus2 * Make sure to start Milvus2 tests with a new collection * Add pinecone to test suite * Update Documentation & Code Style * Fix typing * Update Documentation & Code Style * Add pinecone to docstores dependendcy * Add PineconeDocStore to API Documentation * Add missing comma * Update Documentation & Code Style * Adapt format of doc strings * Update Documentation & Code Style * Set API key as environment variable * Skip Pinecone tests in forks * Add sleep after deleting index * Add sleep after deleting index * Add sleep after creating index * Add check if index ready * Remove printing of index stats * Create new index for each pinecone test * Use RestAPI instead of Python API for describe_index_stats * Fix accessing describe_index_stats * Remove usages of describe_index_stats * Run pinecone tests separately * Update Documentation & Code Style * Add pdftotext to pinecone tests * Remove sleep from doc store fixture * Add describe_index_stats * Remove unused imports * Use pull_request_target trigger * Revert use pull_request_target trigger * Remove set_config * Add os to conftest * Integrate review comments * Set include_values to False * Remove quotation marks from pinecone.Index type * Update Documentation & Code Style * Update Documentation & Code Style * Fix number of args in error messages Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-03-21 16:24:09 +01:00
Julian Risch	ac5617e757	Add basic telemetry features (#2314 ) * add basic telemetry features * change pipeline_config to _component_config * Update Documentation & Code Style * add super().__init__() calls to error classes * make posthog mock work with python 3.7 * Update Documentation & Code Style * update link to docs web page * log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH) * add comment on send_event in BaseComponent.init() and fix mypy * mock NonPrivateParameters and fix pylint undefined-variable * Update Documentation & Code Style * check model path contains multiple / * add test for writing to file * add test for en-/disable telemetry * Update Documentation & Code Style * merge file deletion methods and ignore pylint global statement * Update Documentation & Code Style * set env variable in demo to activate telemetry * fix mock of HAYSTACK_TELEMETRY_ENABLED * fix mypy and linter * add CI as env variable to execution contexts * remove threading, add test for custom error event * Update Documentation & Code Style * simplify config/log file deletion * add test for final event being sent * force writing config file in test * make test compatible with python 3.7 * switch to posthog production server * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-21 11:58:51 +01:00
Sara Zan	11cf94a965	Pipeline's YAML: syntax validation (#2226 ) * Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes * Make error composition work properly * Clarify typing * Help mypy a bit more * Update Documentation & Code Style * Enable autogenerated docs for Milvus1 and 2 separately * Revert "Enable autogenerated docs for Milvus1 and 2 separately" This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d. * Update Documentation & Code Style * Re-enable 'additionalProperties: False' * Add pipeline.type to JSON Schema, was somehow forgotten * Disable additionalProperties on the pipeline properties too * Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future) * Cal super in PipelineValidationError * Improve _read_pipeline_config_from_yaml's error handling * Fix generate_json_schema.py to include document stores * Fix json schemas (retro-fix 1.1.0 again) * Improve custom errors printing, add link to docs * Add function in BaseComponent to list its subclasses in a module * Make some document stores base classes abstract * Add marker 'integration' in pytest flags * Slighly improve validation of pipelines at load * Adding tests for YAML loading and validation * Make custom_query Optional for validation issues * Fix bug in _read_pipeline_config_from_yaml * Improve error handling in BasePipeline and Pipeline and add DAG check * Move json schema generation into haystack/nodes/_json_schema.py (useful for tests) * Simplify errors slightly * Add some YAML validation tests * Remove load_from_config from BasePipeline, it was never used anyway * Improve tests * Include json-schemas in package * Fix conftest imports * Make BasePipeline abstract * Improve mocking by making the test independent from the YAML version * Add exportable_to_yaml decorator to forget about set_config on mock nodes * Fix mypy errors * Comment out one monkeypatch * Fix typing again * Improve error message for validation * Add required properties to pipelines * Fix YAML version for REST API YAMLs to 1.2.0 * Fix load_from_yaml call in load_from_deepset_cloud * fix HaystackError.__getattr__ * Add super().__init__()in most nodes and docstore, comment set_config * Remove type from REST API pipelines * Remove useless init from doc2answers * Call super in Seq3SeqGenerator * Typo in deepsetcloud.py * Fix rest api indexing error mismatch and mock version of JSON schema in all tests * Working on pipeline tests * Improve errors printing slightly * Add back test_pipeline.yaml * _json_schema.py supports different versions with identical schemas * Add type to 0.7 schema for backwards compatibility * Fix small bug in _json_schema.py * Try alternative to generate json schemas on the CI * Update Documentation & Code Style * Make linux CI match autoformat CI * Fix super-init-not-called * Accidentally committed file * Update Documentation & Code Style * fix test_summarizer_translation.py's import * Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args * Fix json schema for ray tests too * Update Documentation & Code Style * Reintroduce validation * Usa unstable version in tests and rest api * Make unstable support the latest versions * Update Documentation & Code Style * Remove needless fixture * Make type in pipeline optional in the strings validation * Fix schemas * Fix string validation for pipeline type * Improve validate_config_strings * Remove type from test p[ipelines * Update Documentation & Code Style * Fix test_pipeline * Removing more type from pipelines * Temporary CI patc * Fix issue with exportable_to_yaml never invoking the wrapped init * rm stray file * pipeline tests are green again * Linux CI now needs .[all] to generate the schema * Bugfixes, pipeline tests seems to be green * Typo in version after merge * Implement missing methods in Weaviate * Trying to avoid FAISS tests from running in the Milvus1 test suite * Fix some stray test paths and faiss index dumping * Fix pytest markers list * Temporarily disable cache to be able to see tests failures * Fix pyproject.toml syntax * Use only tmp_path * Fix preprocessor signature after merge * Fix faiss bug * Fix Ray test * Fix documentation issue by removing quotes from faiss type * Update Documentation & Code Style * use document properly in preprocessor tests * Update Documentation & Code Style * make preprocessor capable of handling documents * import document * Revert support for documents in preprocessor, do later * Fix bug in _json_schema.py that was breaking validation * re-enable cache * Update Documentation & Code Style * Simplify calling _json_schema.py from the CI * Remove redundant ABC inheritance * Ensure exportable_to_yaml works only on implementations * Rename subclass to class_ in Meta * Make run() and get_config() abstract in BasePipeline * Revert unintended change in preprocessor * Move outgoing_edges_input_node check inside try block * Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX * Add check for a RecursionError on validate_config_strings * Address usages of _pipeline_config in data silo and elasticsearch * Rename _pipeline_config into _init_parameters * Fix pytest marker and remove unused imports * Remove most redundant ABCs * Rename _init_parameters into _component_configuration * Remove set_config and type from _component_configuration's dict * Remove last instances of set_config and replace with super().__init__() * Implement __init_subclass__ approach * Simplify checks on the existence of _component_configuration * Fix faiss issue * Dynamic generation of node schemas & weed out old schemas * Add debatable test * Add docstring to debatable test * Positive diff between schemas implemented * Improve diff printing * Rename REST API YAML files to trigger IDE validation * Fix typing issues * Fix more typing * Typo in YAML filename * Remove needless type:ignore * Add tests * Fix tests & validation feedback for accessory classes in custom nodes * Refactor RAGeneratorType out * Fix broken import in conftest * Improve source error handling * Remove unused import in test_eval.py breaking tests * Fix changed error message in tests matches too * Normalize generate_openapi_specs.py and generate_json_schema.py in the actions * Fix path to generate_openapi_specs.py in autoformat.yml * Update Documentation & Code Style * Add test for FAISSDocumentStore-like situations (superclass with init params) * Update Documentation & Code Style * Fix indentation * Remove commented set_config * Store model_name_or_path in FARMReader to use in DistillationDataSilo * Rename _component_configuration into _component_config * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-03-15 11:17:26 +01:00
Sara Zan	982ec4435e	Make windows CI more resistant to cache misses (#2263 )	2022-03-10 15:11:34 +01:00
Sara Zan	18a6545055	Create milvus2 containers outside of haystack/ (#2300 )	2022-03-10 14:55:15 +01:00
MichelBartels	2c423ba063	Introduce support for pymilvus>=2.0.0 (#2126 ) * update remaining occurences of get_connection * fix milvus2 import and fix wrong extra references * change MilvusDocumentStore to Milvus1DocumentStore * update milvus docstrings to reflect updated dependency management * enable milvus 2 tests * fix milvus2 env variable processing * fix dropping collections for each milvus 2 test * make Milvus 2 doc store tests work * allow user to specify consistency level * Fist attempt at running Milvus2 in the CI * Install the correct pymilvus * add batch deletion for milvus2 * change default from milvus 1 to milvus 2 * make milvus2 the default in the docstores extra * Switch milvus1 and milvus2 in base test run on CI * Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus' * Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py * Enable autogenerated docs for Milvus1 and 2 separately * Partial fix to docstring of Milvus2DocumentStore Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Michel Bartels <kontakt@michelbartels.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-02-24 17:43:38 +01:00
Sara Zan	15c70bdb9f	Generate `haystack-pipeline-1.2.0.schema.json` (#2239 ) * Trigger generation of the json schema for 1.2.0 * Remove path filters for `autoformat.yml` Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-02-24 11:45:21 +01:00

1 2 3

124 Commits