haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-10-20 20:39:04 +00:00

Author	SHA1	Message	Date
Madeesh Kannan	a5189dd035	fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 (#6717 ) * fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 Also update tests to accommodate the new behavior. * Remove superfluous code	2024-01-12 17:50:55 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Massimiliano Pippi	7c05f37a53	remove unit marker (#6450 )	2023-11-29 19:24:25 +01:00
Silvano Cerza	831d0611d9	feat: Change default `DuplicatePolicy` in `DocumentStore.write_documents()` (#6438 ) * Change default DuplicatePolicy in DocumentStore.write_documents() * Add release notes	2023-11-28 12:30:17 +01:00
Silvano Cerza	e6637f5ec2	Fix all tests	2023-11-24 14:48:43 +01:00
Massimiliano Pippi	8adb8bbab8	Remove preview folder in test/ --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:52:55 +01:00
Massimiliano Pippi	09e7831f60	clean up 1.x code --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:47:47 +01:00
Ivana Zeljkovic	2326f2f9fe	feat: Pinecone document store optimizations (#5902 ) * Optimize methods for deleting documents and getting vector count. Enable warning messages when Pinecone limits are exceeded on Starter index type. * Fix typo * Add release note * Fix mypy errors * Remove unused import. Fix warning logging message. * Update release note with description about limits for Starter index type in Pinecone * Improve code base by: - Adding new test cases for get_embedding_count method - Fixing get_embedding_count method - Improving delete documents - Fix label retrieval - Increase default batch size - Improve get_document_count method * Remove unused variable * Fix mypy issues	2023-10-16 19:26:24 +02:00
Christian Clauss	bf6d306d68	ci: Simplify Python code with ruff rules SIM (#5833 ) * ci: Simplify Python code with ruff rules SIM * Revert #5828 * ruff --select=I --fix haystack/modeling/infer.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-09-20 08:32:44 +02:00
Christian Clauss	91ab90a256	perf: Python performance improvements with ruff C4 and PERF fixes (#5803 ) * Python performance improvements with ruff C4 and PERF * pre-commit fixes * Revert changes to examples/basic_qa_pipeline.py * Revert changes to haystack/preview/testing/document_store.py * revert releasenotes * Upgrade to ruff v0.0.290	2023-09-16 16:26:07 +02:00
Christian Clauss	1bc03ddc73	ci: Fix all ruff pyflakes errors except unused imports (#5820 ) * ci: Fix all ruff pyflakes errors except unused imports * Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml	2023-09-15 18:30:33 +02:00
Christian Clauss	9405eb90ee	ci: Fix invalid escape sequences in Python code (#5802 ) * ci: Use ruff in pre-commit to further limit complexity * Fix invalid escape sequences in Python code * Delete releasenotes/notes/ruff-4d2504d362035166.yaml	2023-09-14 16:42:48 +02:00
Ivana Zeljkovic	4bad202197	feat: Pinecone document store refactoring (#5725 ) * Refactor codebase so that doc_type metadata is used instead of namespaces for making distinction between documents without embeddings, documents with embeddings and labels * Fix parameter name in integration test * Remove code under comment in add_type_metadata_filter method * Fix mypy and pylint checks * Add release note * Apply minimal changes: rename method, update method docs and remove redundant method * Mypy fixes * Fix docstrings * Revert helper methods for fetching documents when the number of documents exceeds Pinecone limit * Remove unnecessary attributes in PineconeDocumentStore * Fix unit test --------- Co-authored-by: Ivana Zeljkovic <ivana.zeljkovic@smartcat.io> Co-authored-by: DosticJelena <jelena.dostic@smartcat.io>	2023-09-14 11:46:47 +02:00
Christian Clauss	6dd52d91b2	ci: Fix typos discovered by codespell (#5778 ) * Fix typos discovered by codespell * pylint: max-args = 38	2023-09-13 16:14:45 +02:00
tstadel	d46c84bb61	feat: support dynamic filters in custom_query (#5427 ) * support filters in custom_query * better tests * Update docstrings --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-08-08 15:48:15 +02:00
bogdankostic	237d67dbfd	feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 (#5320 ) * Check ES server version + add support for ES <= 7.5 * Adapt comment * PR feedback	2023-07-13 14:50:43 +02:00
bogdankostic	b7f683bfa4	ci: Add unit test for Elasticsearch8 (#5300 ) * Add job for ES8 integration tests * Add unit test for Elasticsearch 8 * Add tests.yml * Adapt tests.yml * Remove added white space * Adapt tests.yml * Adapt tests.yml * Add dependencies to unit test name * Adapt unit test matrix * Adapt unit test matrix * Adapt unit test matrix * Adapt unit test matrix * Update tests.yml * Create separate tests where necessary * Fix skip * Adapt tests	2023-07-10 16:03:50 +02:00
tstadel	9acb275680	fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests (#5113 ) * use _source on opensearch bulk requests * fix label bulk requests * add tests * fix test * apply feedback --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-07-07 15:12:50 +02:00
Vladimir Blagojevic	1066e959a2	bug: fix for pinecone not working for per document updates (#5110 )	2023-07-03 14:07:52 +02:00
Stefano Fiorucci	1be39367ac	Fix: `FAISSDocumentStore` - make `write_documents` properly work in combination w `update_embeddings` (#5221 ) * Update VERSION.txt * first draft * simplify method and test * rm unnecessary pb.close * integrate feedback	2023-07-03 10:07:36 +02:00
Massimiliano Pippi	cb638af0ff	refactor: fix method type and add comments (#5235 ) * fix method type and add comments * fix tests	2023-06-30 11:55:52 +02:00
Massimiliano Pippi	037e4f24ce	refactor: add a new Document Store supporting Elasticsearch 8 (#5231 ) * introduce es8 * prepare tests * fix unit tests * adjust tests * install elastic_transport package * make mypy happy * fix opensearch tests	2023-06-29 16:40:10 +02:00
bogdankostic	8c63e295f4	fix: Allow filtering on list fields in `InMemoryDocumentStore` with all operators (#5208 ) * Add support for list fields * Unskip tests	2023-06-29 12:10:39 +02:00
Massimiliano Pippi	6373e2ea66	refactor: prepare support to Elasticsearch 8 (#5226 ) * make a package * Update haystack/document_stores/elasticsearch/es7.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * do not expose ES types from the package --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-06-29 11:06:20 +02:00
Shukri	916e8452f5	feat!: simplify weaviate auth (#5115 ) * feat!: simplify weaviate auth * docs: explain param precedence * refactor: simplify _get_embedded_options	2023-06-19 15:46:58 +02:00
Ben Heckmann	60e5d73424	fix: changing document scores (#5090 ) * #4653 fix changing scores by returning new document objects from document store queries * added integration test for InMemoryDocumentStore demonstrating the desired behavior * Update test/document_stores/test_memory.py	2023-06-14 17:35:46 +02:00
bogdankostic	da1f245a84	feat: Add batch_size parameter and cast timeout_config value to tuple for `WeaviateDocumentStore` (#5079 ) * Add batch_size parameter and cast timeout_config to tuple * Add unit test * Remove debug tqdm * Remove debug tqdm introduced in #5063	2023-06-06 17:06:10 +02:00
bogdankostic	a9a49e2c0a	feat: Add batching for querying in `ElasticsearchDocumentStore` and `OpenSearchDocumentStore` (#5063 ) * Include benchmark config in output * Use queries from aggregated labels * Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore * Fix mypy * Use self.batch_size in write_documents * Use 10_000 as default batch size * Add unit tests for write documents	2023-06-01 18:47:24 +02:00
Massimiliano Pippi	c6ea542b57	chore: remove BaseKnowledgeGraph (#4953 ) * remove BaseKnowledgeGraph * fix pylint	2023-05-21 10:42:02 +02:00
Massimiliano Pippi	4974bf7ab3	chore: remove deprecated MilvusDocumentStore (#4951 ) * remove deprecated MilvusDocumentStore * remove leftovers * fix pylint	2023-05-19 16:37:38 +02:00
Massimiliano Pippi	85254fe9f6	leftover from merge conflict (#4962 )	2023-05-19 16:10:26 +02:00
Massimiliano Pippi	58acef77c4	avoid importing the weaviate client directly (#4945 )	2023-05-18 16:08:53 +02:00
Shukri	ad162f2e65	feat: Support authentication using AuthBearerToken and AuthClientCredentials in Weaviate (#4028 ) * refactor: make the scope param configurable the scope parameter is used when authenticating using AuthClientPassword and AuthClientCredentials * feat: add support for AuthClientCredentials add support for authenticating using the OIDC Client Credentials authentication flow * feat: add support for AuthBearerToken Add support for authenticating using OIDC and bearer tokens * Update lg * refactor how client is built Signed-off-by: hsm207 <hsm207@users.noreply.github.com> * unit test the auth methods Signed-off-by: hsm207 <hsm207@users.noreply.github.com> * Update test_weaviate.py * revert formatting change * Fix type hints --------- Signed-off-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: John Doe <johndoe@example.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-18 10:17:11 +02:00
bogdankostic	43509c88bf	fix: Add support for `_split_overlap` meta to Pinecone and `dict` metadata in general to Weaviate (#4805 ) * Add support for dicts to Weaviate * Add support for _split_overlap to Pinecone * Add tests * Fix Pylint * Fix Pylint * Fix test * Implement PR feedback	2023-05-05 11:20:21 +02:00
bogdankostic	c7a20d68d2	fix: Add separate query method for OpenSearchDocumentStore (#4764 ) * Add separate query method for OpenSearchDocumentStore * Convert integration test to unit test + add separate tests for OpenSearch	2023-04-26 21:58:33 +02:00
ZanSara	1b57b96210	refactor!: extract `elasticsearch` (#4668 ) * extract elasticsearch * update pyproject.toml * make more import optional * move MockBaseRetriever in conftest * install es in the es integration tests	2023-04-26 10:14:20 +02:00
Massimiliano Pippi	0c081f19e2	fix: remove warnings from the more recent Elasticsearch client (#4602 ) * clean up the ES instance in a more robust way * do not sleep, refresh the index instead * remove client warnings * fix unit tests * fix opensearch compatibility * fix unit tests * update ES version * bump elasticsearch-py * adjust docs * use recreate_index param * use same fixture strategy for Opensearch * Update lg --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-04-18 15:40:17 +02:00
Massimiliano Pippi	a03e8335aa	Ignore cross-reference properties when loading documents (#4664 ) * drop cross-reference properties * be more defensive * fix regression	2023-04-17 10:40:30 +02:00
ZanSara	174d80ab41	skip tests (#4654 )	2023-04-13 17:56:51 +02:00
Silvano Cerza	5ac3dffbef	test: Rework conftest (#4614 ) * Split root conftest into multiple ones and remove unused fixtures * Remove some constants and make them fixtures * Remove unnecessary fixture scoping * Fix failing whisper tests * Fix image_file_paths fixture	2023-04-11 10:33:43 +02:00
Silvano Cerza	3b5223fa1c	refactor: Mark MilvusDocumentStore as deprecated (#4498 ) * Mark MilvusDocumentStore as deprecated * Fix mypy	2023-03-27 15:31:48 +02:00
Silvano Cerza	5b63c2086e	refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever (#4500 ) * Deprecate BaseKnowledgeGraph and InMemoryKnowledgeGraph * Deprecate GraphDBKnowledgeGraph * Fix mypy * Deprecate Text2SparqlRetriever	2023-03-27 15:31:22 +02:00
kaixuanliu	edf39edda0	fix: when using IVF* indexing, ensure the index is trained frist (#4311 ) * add protection, in case we use IVF* indexing, we need to train the index first Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com> * fix formatting issue Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com> * just raising error, instead of silently training the index * fixed mypy issue * fixed error msg --------- Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-03-15 08:55:37 +01:00
Massimiliano Pippi	5aa19ffde6	remove deprecated OpenDistroElasticsearchDocumentStore (#4361 )	2023-03-14 09:12:49 +01:00
Massimiliano Pippi	83d615a32b	feat: include testing facilities into haystack package (#4182 )	2023-02-17 19:38:03 +01:00
bogdankostic	7eeb3e07bf	feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850 ) * Add IVF and Product Quantization support for OpenSearchDocumentStore * Remove unused import statement * Fix mypy * Adapt doc strings and error messages to account for PQ * Adapt validation of indices * Adapt existing tests * Fix pylint * Add tests * Update lg * Adapt based on PR review comments * Fix Pylint * Adapt based on PR review * Add request_timeout * Adapt based on PR review * Adapt based on PR review * Adapt tests * Pin tenacity * Unpin tenacity * Adapt based on PR comments * Add match to tests --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-02-17 10:28:36 +01:00
Massimiliano Pippi	ec72dd73fc	refactor: complete the document stores test refactoring (#4125 ) * add e2e tests * move tests to their own module * add e2e workflow * pylint * remove from job * fix index field name * skip test on sql * removed unused code * fix embedding tests * adjust test for pinecone * adjust assertions to the new documents * bad copypasta * test * fix tests * fix tests * fix test * fix tests * pylint * update milvus version * remove debug * move graphdb tests under e2e	2023-02-16 09:43:25 +01:00
Stefano Fiorucci	24405f851c	refactor: `InMemoryDocumentStore` - manage documents without embedding & fix mypy errors (#4113 ) * refactoring and test * try to replace error with warning * more expressive and robust get_scores methods * make get_scores methods internal	2023-02-14 17:43:11 +01:00
bogdankostic	986472c26f	feat: Add BM25 support for tables in InMemoryDocumentStore (#4090 ) * Add BM25 support for tables in InMemoryDocumentStore * Add table type to query method * Fix import order * Adapt tests	2023-02-09 10:47:35 +01:00
Silvano Cerza	274746db07	style: Update black (#4101 ) * Update black version * Format file with new black style * Update black pre-commit hook version	2023-02-08 15:34:43 +01:00

1 2 3

121 Commits