haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-07 04:27:15 +00:00

Author	SHA1	Message	Date
JohnKagunda	59403de1f0	feat: added `return_embedding` attr in `in_memory/document_store` (#9622 ) * feat: added to init * feat: added return_embedding in to_dict * feat: added return_embedding to filter_documents * feat: added return_embedding to bm25_retrieval * refactor: embedding_retrieval to use return_embedding attribute rather than parameter passed * docs: added releasenote * fix: pop from doc_fields instead of changing return_documents attr to none * fix: made return_embedding an optional field and removed deprecation warning * fix: give return_embedding a higher priority than self.return_embedding * feat: changed default behaviour of return_embedding to True * chore: update tests after InMemory Document store update * Update releasenotes/notes/update-in-memory-document-store-17f555695caf9d52.yaml Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * chore: update docs * chore: enhanced clarity and redability of expression * test: return_embedding is set to false during initialization * test: overriding return_embedding inside * fix: changed the use of self.filter_documents to actual implementation inside `embedding_retrieval` Signed-off-by: rafaeljohn9 <rafaeljohb@gmail.com> --------- Signed-off-by: rafaeljohn9 <rafaeljohb@gmail.com> Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>	2025-07-23 10:48:14 +00:00
Sebastian Husch Lee	85258f0654	fix: Fix types and formatting pipeline test_run.py (#9575 ) * Fix types in test_run.py * Get test_run.py to pass fmt-check * Add test_run to mypy checks * Update test folder to pass ruff linting * Fix merge * Fix HF tests * Fix hf test * Try to fix tests * Another attempt * minor fix * fix SentenceTransformersDiversityRanker * skip integrations tests due to model unavailable on HF inference --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2025-07-03 09:49:09 +02:00
Stefano Fiorucci	f8155e1b77	chore: clean up (#9504 )	2025-06-11 11:05:05 +02:00
Stefano Fiorucci	e3f9da13d0	test: fix test incorrectly marked as async (#9327 ) * test: fix test incorrectly marked as async * fix inmemory async tests	2025-04-30 14:07:30 +00:00
David S. Batista	672ab09477	fix: cleaning up `InMemoryDocumentStore` executor when created inside the class (#8994 ) * cleaning up executor when created inside the class * adding missed tests	2025-03-07 11:01:29 +01:00
David S. Batista	9581fea3bc	feat: adding async version of `InMemoryDocumentStore` and associated retrievers (#8963 ) * adding classes from experimental * adding release notes * adding tests * merging all into a single class * adding async retriever methods * Update haystack/document_stores/in_memory/document_store.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * adding missed tests --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-03-05 11:36:24 +01:00
Stefano Fiorucci	f3c44be904	refactor!: remove `dataframe` field from `Document` and `ExtractedTableAnswer`; make `pandas` optional (#8906 ) * remove dataframe * release note * small fix * group imports * Update pyproject.toml Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update pyproject.toml Co-authored-by: Julian Risch <julian.risch@deepset.ai> * address feedback --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2025-03-04 11:06:07 +00:00
Stefano Fiorucci	bc30105fbc	test: reorganize docstore test suite to isolate dataframe tests (#8684 ) * reorganize docstore test suite to isolate dataframe tests * improve docstring * include FilterDocumentsTestWithDataframe in InMemoryDocumentStore tests	2025-01-08 14:58:52 +00:00
Vladimir Blagojevic	7e9f153e78	chore: Remove all references to old filter syntax (#8342 ) * Remove all references to old filter syntax * More removals * Lint * Do not remove test_filter_retriever.py * Add reno note * Update ValueError text to match text in haystack-core-integrations	2024-09-12 16:28:31 +02:00
Vladimir Blagojevic	21c507331c	feat: Implement apply_filter_policy and FilterPolicy.MERGE for the new filters (#8042 )	2024-08-09 12:04:24 +02:00
David Berenstein	08104e0042	feat: InMemoryDocumentStore serialization (#7888 ) * Add: InMemoryDocumentStore serialization * Add: additional chek to test if path exists * Fix: failing test	2024-06-21 16:45:25 +02:00
Vladimir Blagojevic	4c59000c21	feat: Add apply_filter_policy function (#7902 ) * Add apply_filter_policy * Add release note	2024-06-20 13:44:23 +02:00
Silvano Cerza	854c4173f2	feat: Add memory sharing between different instances of `InMemoryDocumentStore` (#7781 ) * Add memory sharing between different instances of InMemoryDocumentStore * Fix FilterRetriever tests * Fix InMemoryBM25Retriever tests	2024-05-31 16:44:14 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Guest400123064	cd66a80ba2	perf: enhanced `InMemoryDocumentStore` BM25 query efficiency with incremental indexing (#7549 ) * incorporating better bm25 impl without breaking interface * all three bm25 algos * 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import * 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability * fix score type initialization (int -> float) to pass mypy check * release note included * fixing linting issues and mypy * fixing tests * removing heapq import and cleaning up logging * changing indexing order * adding more tests * increasing tests * removing rank_bm25 from pyproject.toml --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-05-03 12:10:15 +00:00
ZanSara	1182c08daf	fix: Dont filter negative scores when using `BM25Okapi` and `scale_score=False` (#6889 ) * dont filter negatives for unscaled Okapi * change BM25 algorithm default to BM25L * Update haystack/document_stores/in_memory/document_store.py * improve comment	2024-02-06 11:07:27 +01:00
Madeesh Kannan	a5189dd035	fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 (#6717 ) * fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0 Also update tests to accommodate the new behavior. * Remove superfluous code	2024-01-12 17:50:55 +01:00
Massimiliano Pippi	e1ec4e5e4d	refact!: Remove symbols under the `haystack.document_stores` namespace (#6714 ) * remove symbols under the haystack.document_stores namespace * Update haystack/document_stores/types/protocol.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * fix * same for retrievers * leftovers * more leftovers * add relnote * leftovers * one more * fix examples --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2024-01-10 21:20:42 +01:00
Massimiliano Pippi	7c05f37a53	remove unit marker (#6450 )	2023-11-29 19:24:25 +01:00
Silvano Cerza	831d0611d9	feat: Change default `DuplicatePolicy` in `DocumentStore.write_documents()` (#6438 ) * Change default DuplicatePolicy in DocumentStore.write_documents() * Add release notes	2023-11-28 12:30:17 +01:00
Silvano Cerza	e6637f5ec2	Fix all tests	2023-11-24 14:48:43 +01:00
Massimiliano Pippi	8adb8bbab8	Remove preview folder in test/ --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:52:55 +01:00
Massimiliano Pippi	09e7831f60	clean up 1.x code --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2023-11-24 11:47:47 +01:00
Ivana Zeljkovic	2326f2f9fe	feat: Pinecone document store optimizations (#5902 ) * Optimize methods for deleting documents and getting vector count. Enable warning messages when Pinecone limits are exceeded on Starter index type. * Fix typo * Add release note * Fix mypy errors * Remove unused import. Fix warning logging message. * Update release note with description about limits for Starter index type in Pinecone * Improve code base by: - Adding new test cases for get_embedding_count method - Fixing get_embedding_count method - Improving delete documents - Fix label retrieval - Increase default batch size - Improve get_document_count method * Remove unused variable * Fix mypy issues	2023-10-16 19:26:24 +02:00
Christian Clauss	bf6d306d68	ci: Simplify Python code with ruff rules SIM (#5833 ) * ci: Simplify Python code with ruff rules SIM * Revert #5828 * ruff --select=I --fix haystack/modeling/infer.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-09-20 08:32:44 +02:00
Christian Clauss	91ab90a256	perf: Python performance improvements with ruff C4 and PERF fixes (#5803 ) * Python performance improvements with ruff C4 and PERF * pre-commit fixes * Revert changes to examples/basic_qa_pipeline.py * Revert changes to haystack/preview/testing/document_store.py * revert releasenotes * Upgrade to ruff v0.0.290	2023-09-16 16:26:07 +02:00
Christian Clauss	1bc03ddc73	ci: Fix all ruff pyflakes errors except unused imports (#5820 ) * ci: Fix all ruff pyflakes errors except unused imports * Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml	2023-09-15 18:30:33 +02:00
Christian Clauss	9405eb90ee	ci: Fix invalid escape sequences in Python code (#5802 ) * ci: Use ruff in pre-commit to further limit complexity * Fix invalid escape sequences in Python code * Delete releasenotes/notes/ruff-4d2504d362035166.yaml	2023-09-14 16:42:48 +02:00
Ivana Zeljkovic	4bad202197	feat: Pinecone document store refactoring (#5725 ) * Refactor codebase so that doc_type metadata is used instead of namespaces for making distinction between documents without embeddings, documents with embeddings and labels * Fix parameter name in integration test * Remove code under comment in add_type_metadata_filter method * Fix mypy and pylint checks * Add release note * Apply minimal changes: rename method, update method docs and remove redundant method * Mypy fixes * Fix docstrings * Revert helper methods for fetching documents when the number of documents exceeds Pinecone limit * Remove unnecessary attributes in PineconeDocumentStore * Fix unit test --------- Co-authored-by: Ivana Zeljkovic <ivana.zeljkovic@smartcat.io> Co-authored-by: DosticJelena <jelena.dostic@smartcat.io>	2023-09-14 11:46:47 +02:00
Christian Clauss	6dd52d91b2	ci: Fix typos discovered by codespell (#5778 ) * Fix typos discovered by codespell * pylint: max-args = 38	2023-09-13 16:14:45 +02:00
tstadel	d46c84bb61	feat: support dynamic filters in custom_query (#5427 ) * support filters in custom_query * better tests * Update docstrings --------- Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-08-08 15:48:15 +02:00
bogdankostic	237d67dbfd	feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 (#5320 ) * Check ES server version + add support for ES <= 7.5 * Adapt comment * PR feedback	2023-07-13 14:50:43 +02:00
bogdankostic	b7f683bfa4	ci: Add unit test for Elasticsearch8 (#5300 ) * Add job for ES8 integration tests * Add unit test for Elasticsearch 8 * Add tests.yml * Adapt tests.yml * Remove added white space * Adapt tests.yml * Adapt tests.yml * Add dependencies to unit test name * Adapt unit test matrix * Adapt unit test matrix * Adapt unit test matrix * Adapt unit test matrix * Update tests.yml * Create separate tests where necessary * Fix skip * Adapt tests	2023-07-10 16:03:50 +02:00
tstadel	9acb275680	fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests (#5113 ) * use _source on opensearch bulk requests * fix label bulk requests * add tests * fix test * apply feedback --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2023-07-07 15:12:50 +02:00
Vladimir Blagojevic	1066e959a2	bug: fix for pinecone not working for per document updates (#5110 )	2023-07-03 14:07:52 +02:00
Stefano Fiorucci	1be39367ac	Fix: `FAISSDocumentStore` - make `write_documents` properly work in combination w `update_embeddings` (#5221 ) * Update VERSION.txt * first draft * simplify method and test * rm unnecessary pb.close * integrate feedback	2023-07-03 10:07:36 +02:00
Massimiliano Pippi	cb638af0ff	refactor: fix method type and add comments (#5235 ) * fix method type and add comments * fix tests	2023-06-30 11:55:52 +02:00
Massimiliano Pippi	037e4f24ce	refactor: add a new Document Store supporting Elasticsearch 8 (#5231 ) * introduce es8 * prepare tests * fix unit tests * adjust tests * install elastic_transport package * make mypy happy * fix opensearch tests	2023-06-29 16:40:10 +02:00
bogdankostic	8c63e295f4	fix: Allow filtering on list fields in `InMemoryDocumentStore` with all operators (#5208 ) * Add support for list fields * Unskip tests	2023-06-29 12:10:39 +02:00
Massimiliano Pippi	6373e2ea66	refactor: prepare support to Elasticsearch 8 (#5226 ) * make a package * Update haystack/document_stores/elasticsearch/es7.py Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> * do not expose ES types from the package --------- Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-06-29 11:06:20 +02:00
Shukri	916e8452f5	feat!: simplify weaviate auth (#5115 ) * feat!: simplify weaviate auth * docs: explain param precedence * refactor: simplify _get_embedded_options	2023-06-19 15:46:58 +02:00
Ben Heckmann	60e5d73424	fix: changing document scores (#5090 ) * #4653 fix changing scores by returning new document objects from document store queries * added integration test for InMemoryDocumentStore demonstrating the desired behavior * Update test/document_stores/test_memory.py	2023-06-14 17:35:46 +02:00
bogdankostic	da1f245a84	feat: Add batch_size parameter and cast timeout_config value to tuple for `WeaviateDocumentStore` (#5079 ) * Add batch_size parameter and cast timeout_config to tuple * Add unit test * Remove debug tqdm * Remove debug tqdm introduced in #5063	2023-06-06 17:06:10 +02:00
bogdankostic	a9a49e2c0a	feat: Add batching for querying in `ElasticsearchDocumentStore` and `OpenSearchDocumentStore` (#5063 ) * Include benchmark config in output * Use queries from aggregated labels * Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore * Fix mypy * Use self.batch_size in write_documents * Use 10_000 as default batch size * Add unit tests for write documents	2023-06-01 18:47:24 +02:00
Massimiliano Pippi	c6ea542b57	chore: remove BaseKnowledgeGraph (#4953 ) * remove BaseKnowledgeGraph * fix pylint	2023-05-21 10:42:02 +02:00
Massimiliano Pippi	4974bf7ab3	chore: remove deprecated MilvusDocumentStore (#4951 ) * remove deprecated MilvusDocumentStore * remove leftovers * fix pylint	2023-05-19 16:37:38 +02:00
Massimiliano Pippi	85254fe9f6	leftover from merge conflict (#4962 )	2023-05-19 16:10:26 +02:00
Massimiliano Pippi	58acef77c4	avoid importing the weaviate client directly (#4945 )	2023-05-18 16:08:53 +02:00
Shukri	ad162f2e65	feat: Support authentication using AuthBearerToken and AuthClientCredentials in Weaviate (#4028 ) * refactor: make the scope param configurable the scope parameter is used when authenticating using AuthClientPassword and AuthClientCredentials * feat: add support for AuthClientCredentials add support for authenticating using the OIDC Client Credentials authentication flow * feat: add support for AuthBearerToken Add support for authenticating using OIDC and bearer tokens * Update lg * refactor how client is built Signed-off-by: hsm207 <hsm207@users.noreply.github.com> * unit test the auth methods Signed-off-by: hsm207 <hsm207@users.noreply.github.com> * Update test_weaviate.py * revert formatting change * Fix type hints --------- Signed-off-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: John Doe <johndoe@example.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com> Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2023-05-18 10:17:11 +02:00
bogdankostic	43509c88bf	fix: Add support for `_split_overlap` meta to Pinecone and `dict` metadata in general to Weaviate (#4805 ) * Add support for dicts to Weaviate * Add support for _split_overlap to Pinecone * Add tests * Fix Pylint * Fix Pylint * Fix test * Implement PR feedback	2023-05-05 11:20:21 +02:00

1 2 3

137 Commits