81 Commits

Author SHA1 Message Date
Silvano Cerza
3b5223fa1c
refactor: Mark MilvusDocumentStore as deprecated (#4498)
* Mark MilvusDocumentStore as deprecated

* Fix mypy
2023-03-27 15:31:48 +02:00
Silvano Cerza
5b63c2086e
refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever (#4500)
* Deprecate BaseKnowledgeGraph and InMemoryKnowledgeGraph

* Deprecate GraphDBKnowledgeGraph

* Fix mypy

* Deprecate Text2SparqlRetriever
2023-03-27 15:31:22 +02:00
kaixuanliu
edf39edda0
fix: when using IVF* indexing, ensure the index is trained frist (#4311)
* add protection, in case we use IVF* indexing, we need to train the index first

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>

* fix formatting issue

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>

* just raising error, instead of silently training the index

* fixed mypy issue

* fixed error msg

---------

Signed-off-by: Liu,Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-15 08:55:37 +01:00
Massimiliano Pippi
5aa19ffde6
remove deprecated OpenDistroElasticsearchDocumentStore (#4361) 2023-03-14 09:12:49 +01:00
Massimiliano Pippi
83d615a32b
feat: include testing facilities into haystack package (#4182) 2023-02-17 19:38:03 +01:00
bogdankostic
7eeb3e07bf
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850)
* Add IVF and Product Quantization support for OpenSearchDocumentStore

* Remove unused import statement

* Fix mypy

* Adapt doc strings and error messages to account for PQ

* Adapt validation of indices

* Adapt existing tests

* Fix pylint

* Add tests

* Update lg

* Adapt based on PR review comments

* Fix Pylint

* Adapt based on PR review

* Add request_timeout

* Adapt based on PR review

* Adapt based on PR review

* Adapt tests

* Pin tenacity

* Unpin tenacity

* Adapt based on PR comments

* Add match to tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-17 10:28:36 +01:00
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Stefano Fiorucci
24405f851c
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors (#4113)
* refactoring and test

* try to replace error with warning

* more expressive and robust get_scores methods

* make get_scores methods internal
2023-02-14 17:43:11 +01:00
bogdankostic
986472c26f
feat: Add BM25 support for tables in InMemoryDocumentStore (#4090)
* Add BM25 support for tables in InMemoryDocumentStore

* Add table type to query method

* Fix import order

* Adapt tests
2023-02-09 10:47:35 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
bogdankostic
1a8fe0031d
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore (#3969)
* Add `use_prefiltering` parameter

* Adapt doc string

* Pass use_prefiltering via API to dC

* Adapt doc string

* Adapt test
2023-01-30 15:12:34 +01:00
Sebastian
71de0524de
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number (#3980)
* Fix the embedding count function of InMemoryDocumentStore

* Adding some doc strings explaining how many docs with embeddings to expect.
2023-01-30 12:38:30 +01:00
Massimiliano Pippi
52b195faf6
increase the timeout for testing (#3957) 2023-01-26 16:04:43 +01:00
Fabian
61ebe4b5dc
fix: authenticate with aws4auth if set in OpenSearchDocumentStore (#3741)
* bug(OpenSearchDocumentStore): fix authenticate with aws4auth if set.

Rearrange check to authenticate with aws4auth before username
and password, as the username is set to "admin" by default.

* Make username check less restrictive

* Fix test, do not used mocked _init_client function

* Add warning for aws4auth and username to ElasticSearchDocumentStore

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-01-24 10:01:39 +01:00
Ahmed Nabil
12e057837b
Adding condition to pinecone object. (#3768)
* Adding condition to `pinecone` object.

While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening.

* Added test, and changed the code to make sure the pinecone idx variable has correct instance

* fixed black error

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-19 01:34:44 +05:30
tstadel
6ca88bfd23
fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field (#3662)
* fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field

* fix pylint

* add tests

* fix mypy

* fix merge

* format

* fix pylint

* move tests to SearchEngineDocumentStoreTestAbstract

* move missed constants

* add mocked_document_store fixture to TestElasticsearchDocumentStore

* fix mocked_document_store

* fix get_all_documents tests for elasticsearch>=7.16

* fix tests

* fix tests try 2
2023-01-09 11:58:23 +01:00
Julian Risch
0c2d13f1b8
bug: skip validating empty embeddings (#3774)
* skip validating empty embeddings

* skip batches without embeddings to update

* add unit test with mocked retriever
2023-01-05 15:13:57 +01:00
tstadel
6c067b2b4f
feat: make score_script first class citizen via knn_engine param (#3284)
* OpenSearchDocumentStore: make score_script accessible via knn_engine

* blacken

* fix tests

* fix format

* fix naming of 'score_script' consistently

* fix tests

* fix test

* fix ef_search tests

* always validate index

* improve clone_embedding_field

* fix pylint

* reformat

* remove port

* update tests

* set no_implicit_optional = false

* fix myp

* fix test

* refactorings

* reformat

* fix and refactor tests

* better tests

* create search_field mappings

* remove no_implicit_optional = false

* skip validation for custom mapping

* format

* Apply suggestions from docs code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply tougher suggestions from code review

* fix messages

* fix typos

* update tests

* Update haystack/document_stores/opensearch.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* fix tests

* fix ef_search validation

* add test for ef_search nmslib

* fix assert_not_called

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-27 15:24:31 +01:00
Zoltan Fedor
e143f7cc36
Fixing broken BM25 support with Weaviate - fixes #3720 (#3723)
* Fixing broken BM25 support with Weaviate - fixes #3720

Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.

Please see more under issue #3720.

* Fixing mypy issue - method signature wasn't matching the base class

* Mypy related test fix

Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.

* Adding a note regarding an upcomming fix in Weaviate v1.17.0

* Apply suggestions from code review

* revert

* [EMPTY] Re-trigger CI
2022-12-19 17:24:46 +01:00
James Briggs
520b23ec1b
fix: pinecone metadata format (#3660)
* fix for multilevel metadata dictionaries

* add metadata dict formating to update function

* typing

* added check for labels meta

* added more info to input parameters

* added test for multilayer metadata

* removed todo
2022-12-13 10:11:24 +01:00
tstadel
600dc2d611
refactor: filters type (#3682)
* consolidate filters type

* remove unnecessary optionals

* fix mypy

* fix pylint

* fix pylint

* move FilterType to schema

* remove Optional from FilterType

* move to Dict[str, Any]

* Revert "move to Dict[str, Any]"

This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5.

* fix mypy

* fix pylint

* revert isort changes in elasticsearch

* remove todos in milvus.py

* remove todos in sql.py

* add aggregate_labels tests

* consolidate aggregate_labels tests

* remove superfluous type todos

* remove ALL superfluous #todos
2022-12-12 14:04:29 +01:00
tstadel
c1c1c97bb2
feat: add query_by_embedding_batch (#3546)
* add query_by_embedding_batch

* fix mypy

* fix pylint

* add test

* move query_by_embedding_batch to search_engine

* fix and add tests

* fix pylint

* remove Retriever query logs

* add test for multimodal batch retrieval

* allow for np.ndarray
2022-12-08 08:28:43 +01:00
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests (#3646)
* revert weaviate query with filters and improve tests

* pylint

* upgrade weaviate container

* use latest docker tag

* fix text

* fix text
2022-12-06 14:48:58 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class (#3637)
* move more tests to the base class

* skip tests where unsupported

* do not pass index label explicitly

* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters (#3628) 2022-11-28 12:26:33 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests (#3620)
* remove redundant tests

* skip test on win

* fix missing import

* revert mistake

* revert
2022-11-25 20:55:21 +05:30
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class (#3614)
* move tests to their own class

* move more tests

* add specific job

* fix test

* Update test/document_stores/test_memory.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever in InMemoryDocumentStore (#3561)
* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents
2022-11-22 09:24:52 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore (#3547)
* add warnings

* make the field cachable

* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module (#3596) 2022-11-17 16:22:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier output in SQLDocumentStore; remove _sql_session_rollback hack in tests (#3273)
* first draft

* fix

* fix

* move test to test_sql
2022-11-16 12:20:57 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate (#3578)
* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta (#3572)
* fix meta bug

* adjust brownfield test
2022-11-15 12:13:21 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" (#3548)
* add_all fixes the bug

* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552)
* remove milvus1

* leftover

* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests (#3555)
* add pytest option to unmock pinecone

* first try

* handle missing answer

* fix labels metadata

* more tests

* adapt workflow

* typo

* address review comments
2022-11-14 15:19:15 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests (#3541)
* refactor tests

* fix job

* revert

* revert

* revert

* use latest weaviate

* fix abstract methods signatures

* pass class_name to all the CRUD methods

* finish moving all the tests

* bump weaviate version

* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests (#3537)
* fix write docs behaviour

* refactor FAISS tests

* do not remove the sqlite db

* try

* remove extra slash

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_faiss.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job (#3529)
* move dC tests to their own module and job

* restore global var

* revert
2022-11-04 17:05:10 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests (#3517)
* port SQL tests

* cleanup document_store_tests.py from sql tests

* leftover

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_base.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore (#3346) 2022-11-02 11:16:36 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring (#3449)
* add new marker

* start using test hierarchies

* move ES tests into their own class

* refactor test workflow

* job steps

* add more tests

* move more tests

* more tests

* test labels

* add more tests

* Update tests.yml

* Update tests.yml

* fix

* typo

* fix es image tag

* map es ports

* try

* fix

* default port

* remove opensearch from the markers sorcery

* revert

* skip new tests in old jobs

* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer attribute (#3411)
* always run validation

* update schemas

* no_answer as a property. break things!

* forgotten schema

* fix

* update openapi

* removed my unnecessary test

* fix sql document store

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Mayank Jobanputra
d48577b4e7
bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings workflow (#3368)
* Removed explicit passage formatting by name field

* passing correct input type for embedding the docs

* Updated test, updated similarity scores and added results

* changed expected input to embed method
2022-10-25 14:52:05 +05:30
Stefano Fiorucci
7290196c32
fix: allow same vector_id in different indexes for SQL-based Document stores (#3383)
* fix_multiple_indexes

* improve test names
2022-10-14 09:55:56 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 (#3318)
* bump elastic to 7.16.2+

* decouple Elasticsearch and Opensearch

use method override instead of func variables

fix mypy

default value

fix broken tests

update schema

* relax version pin

* rename the base class

* rename module

* fix import order

* do not run the new tests in the old job

* remove outdated TODO
2022-10-13 11:53:27 +02:00
tstadel
b84a6b1716
fix: opensearch script score with filters (#3321)
* fix opensearch script score filters

* add comment

* add integration test

* update schema
2022-10-06 15:41:29 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine (#3217)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* split PR

* update docs

* manually revert tutorial doc change

* Fix embedding type

* set integration marker correctly

* make BaseDocumentStore.normalize_embedding static

* format

* fix handling of opensearch_faiss param

* fix merge

* add DenseRetriever typing

* organize imports in conftest.py

* organize imports in conftest.py (2)

* fix DenseRetriever import

* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running (#3263)
* fix milvus and faiss tests not running

* fix schema manually

* fix test_dpr_embedding test for milvus

* pip freeze on milvus tests

* fix milvus1 tests being executed: fix all_doc_stores order

* Revert "pip freeze on milvus tests"

This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.

* make infer_required_doc_store more robust

* don't skip tests without docstore requirements

* use markers for docstore tests
2022-09-22 17:46:49 +02:00