65 Commits

Author SHA1 Message Date
tstadel
6ca88bfd23
fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field (#3662)
* fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field

* fix pylint

* add tests

* fix mypy

* fix merge

* format

* fix pylint

* move tests to SearchEngineDocumentStoreTestAbstract

* move missed constants

* add mocked_document_store fixture to TestElasticsearchDocumentStore

* fix mocked_document_store

* fix get_all_documents tests for elasticsearch>=7.16

* fix tests

* fix tests try 2
2023-01-09 11:58:23 +01:00
Julian Risch
0c2d13f1b8
bug: skip validating empty embeddings (#3774)
* skip validating empty embeddings

* skip batches without embeddings to update

* add unit test with mocked retriever
2023-01-05 15:13:57 +01:00
tstadel
6c067b2b4f
feat: make score_script first class citizen via knn_engine param (#3284)
* OpenSearchDocumentStore: make score_script accessible via knn_engine

* blacken

* fix tests

* fix format

* fix naming of 'score_script' consistently

* fix tests

* fix test

* fix ef_search tests

* always validate index

* improve clone_embedding_field

* fix pylint

* reformat

* remove port

* update tests

* set no_implicit_optional = false

* fix myp

* fix test

* refactorings

* reformat

* fix and refactor tests

* better tests

* create search_field mappings

* remove no_implicit_optional = false

* skip validation for custom mapping

* format

* Apply suggestions from docs code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply tougher suggestions from code review

* fix messages

* fix typos

* update tests

* Update haystack/document_stores/opensearch.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* fix tests

* fix ef_search validation

* add test for ef_search nmslib

* fix assert_not_called

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-27 15:24:31 +01:00
Zoltan Fedor
e143f7cc36
Fixing broken BM25 support with Weaviate - fixes #3720 (#3723)
* Fixing broken BM25 support with Weaviate - fixes #3720

Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.

Please see more under issue #3720.

* Fixing mypy issue - method signature wasn't matching the base class

* Mypy related test fix

Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.

* Adding a note regarding an upcomming fix in Weaviate v1.17.0

* Apply suggestions from code review

* revert

* [EMPTY] Re-trigger CI
2022-12-19 17:24:46 +01:00
James Briggs
520b23ec1b
fix: pinecone metadata format (#3660)
* fix for multilevel metadata dictionaries

* add metadata dict formating to update function

* typing

* added check for labels meta

* added more info to input parameters

* added test for multilayer metadata

* removed todo
2022-12-13 10:11:24 +01:00
tstadel
600dc2d611
refactor: filters type (#3682)
* consolidate filters type

* remove unnecessary optionals

* fix mypy

* fix pylint

* fix pylint

* move FilterType to schema

* remove Optional from FilterType

* move to Dict[str, Any]

* Revert "move to Dict[str, Any]"

This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5.

* fix mypy

* fix pylint

* revert isort changes in elasticsearch

* remove todos in milvus.py

* remove todos in sql.py

* add aggregate_labels tests

* consolidate aggregate_labels tests

* remove superfluous type todos

* remove ALL superfluous #todos
2022-12-12 14:04:29 +01:00
tstadel
c1c1c97bb2
feat: add query_by_embedding_batch (#3546)
* add query_by_embedding_batch

* fix mypy

* fix pylint

* add test

* move query_by_embedding_batch to search_engine

* fix and add tests

* fix pylint

* remove Retriever query logs

* add test for multimodal batch retrieval

* allow for np.ndarray
2022-12-08 08:28:43 +01:00
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests (#3646)
* revert weaviate query with filters and improve tests

* pylint

* upgrade weaviate container

* use latest docker tag

* fix text

* fix text
2022-12-06 14:48:58 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class (#3637)
* move more tests to the base class

* skip tests where unsupported

* do not pass index label explicitly

* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters (#3628) 2022-11-28 12:26:33 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests (#3620)
* remove redundant tests

* skip test on win

* fix missing import

* revert mistake

* revert
2022-11-25 20:55:21 +05:30
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore tests to their own class (#3614)
* move tests to their own class

* move more tests

* add specific job

* fix test

* Update test/document_stores/test_memory.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever in InMemoryDocumentStore (#3561)
* very first draft

* implement query and query_batch

* add more bm25 parameters

* add rank_bm25 dependency

* fix mypy

* remove tokenizer callable parameter

* remove unused import

* only json serializable attributes

* try to fix: pylint too-many-public-methods / R0904

* bm25 attribute always present

* convert errors into warnings to make the tutorial 1 work

* add docstrings; tests

* try to make tests run

* better docstrings; revert not running tests

* some suggestions from review

* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25

* exclude tests with filters

* change elasticsearch to bm25 retriever in test_summarizer

* add tests

* try to improve tests

* better type hint

* adapt test_table_text_retriever_embedding

* handle non-textual docs

* query only textual documents
2022-11-22 09:24:52 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore (#3547)
* add warnings

* make the field cachable

* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module (#3596) 2022-11-17 16:22:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier output in SQLDocumentStore; remove _sql_session_rollback hack in tests (#3273)
* first draft

* fix

* fix

* move test to test_sql
2022-11-16 12:20:57 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate (#3578)
* fix weaviate bug in returning embeddings and setting empty meta fields

* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta (#3572)
* fix meta bug

* adjust brownfield test
2022-11-15 12:13:21 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" (#3548)
* add_all fixes the bug

* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x (#3552)
* remove milvus1

* leftover

* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests (#3555)
* add pytest option to unmock pinecone

* first try

* handle missing answer

* fix labels metadata

* more tests

* adapt workflow

* typo

* address review comments
2022-11-14 15:19:15 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests (#3541)
* refactor tests

* fix job

* revert

* revert

* revert

* use latest weaviate

* fix abstract methods signatures

* pass class_name to all the CRUD methods

* finish moving all the tests

* bump weaviate version

* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests (#3537)
* fix write docs behaviour

* refactor FAISS tests

* do not remove the sqlite db

* try

* remove extra slash

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_faiss.py

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job (#3529)
* move dC tests to their own module and job

* restore global var

* revert
2022-11-04 17:05:10 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests (#3517)
* port SQL tests

* cleanup document_store_tests.py from sql tests

* leftover

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* review comments

* Update test/document_stores/test_base.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore (#3346) 2022-11-02 11:16:36 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring (#3449)
* add new marker

* start using test hierarchies

* move ES tests into their own class

* refactor test workflow

* job steps

* add more tests

* move more tests

* more tests

* test labels

* add more tests

* Update tests.yml

* Update tests.yml

* fix

* typo

* fix es image tag

* map es ports

* try

* fix

* default port

* remove opensearch from the markers sorcery

* revert

* skip new tests in old jobs

* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer attribute (#3411)
* always run validation

* update schemas

* no_answer as a property. break things!

* forgotten schema

* fix

* update openapi

* removed my unnecessary test

* fix sql document store

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Mayank Jobanputra
d48577b4e7
bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings workflow (#3368)
* Removed explicit passage formatting by name field

* passing correct input type for embedding the docs

* Updated test, updated similarity scores and added results

* changed expected input to embed method
2022-10-25 14:52:05 +05:30
Stefano Fiorucci
7290196c32
fix: allow same vector_id in different indexes for SQL-based Document stores (#3383)
* fix_multiple_indexes

* improve test names
2022-10-14 09:55:56 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 (#3318)
* bump elastic to 7.16.2+

* decouple Elasticsearch and Opensearch

use method override instead of func variables

fix mypy

default value

fix broken tests

update schema

* relax version pin

* rename the base class

* rename module

* fix import order

* do not run the new tests in the old job

* remove outdated TODO
2022-10-13 11:53:27 +02:00
tstadel
b84a6b1716
fix: opensearch script score with filters (#3321)
* fix opensearch script score filters

* add comment

* add integration test

* update schema
2022-10-06 15:41:29 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine (#3217)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* split PR

* update docs

* manually revert tutorial doc change

* Fix embedding type

* set integration marker correctly

* make BaseDocumentStore.normalize_embedding static

* format

* fix handling of opensearch_faiss param

* fix merge

* add DenseRetriever typing

* organize imports in conftest.py

* organize imports in conftest.py (2)

* fix DenseRetriever import

* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running (#3263)
* fix milvus and faiss tests not running

* fix schema manually

* fix test_dpr_embedding test for milvus

* pip freeze on milvus tests

* fix milvus1 tests being executed: fix all_doc_stores order

* Revert "pip freeze on milvus tests"

This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.

* make infer_required_doc_store more robust

* don't skip tests without docstore requirements

* use markers for docstore tests
2022-09-22 17:46:49 +02:00
tstadel
b10e2c392e
chore: add DenseRetriever abstraction (#3252)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* update docs

* fix imports

* import  DenseRetriever normally

* update docs

* fix deepcopy of documents

* update schema

* Revert "update schema"

This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f.

* fix schema for ci manually
2022-09-21 19:08:54 +02:00
Stefano Fiorucci
89247b804c
refactor: make TransformersDocumentClassifier output consistent between different types of classification (#3224)
* make output consistent

* make output consistent

* added tests for details

* better tests

* Update test_document_classifier.py

* make black happy

* Update test_document_classifier.py

* Update test_document_classifier.py
2022-09-21 13:16:03 +02:00
Kristof Herrmann
da1cc577ae
feat: exponential backoff with exp decreasing batch size for opensearch client (#3194)
* Validate custom_mapping properly as an object

* Remove related test

* black

* feat: exponential backoff with exp dec batch size

* added docstring and split doc lsit

* fix

* fix mypy

* fix

* catch generic exception

* added test

* mypy ignore

* fixed no attribute

* added test

* added tests

* revert strange merge conflicts

* revert merge conflict again

* Update haystack/document_stores/elasticsearch.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* done

* adjust test

* remove not required caplog

* fixed comments

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-09-13 14:30:30 +01:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index (#3101)
* Add check for mapping for existing indices

* Add test

* Check if "method" field exists
2022-08-25 17:33:26 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029)
* support faiss hnsw

* blacken

* update docs

* improve similarity check

* add tests

* update schema

* set ef_search param correctly

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* regenerate docs

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db (#2749)
* update to PineconeDocumentStore to remove dependency on SQL db

* Update Documentation & Code Style

* typing fixes

* Update Documentation & Code Style

* fixed embedding generator to yield Documents

* Update Documentation & Code Style

* fixes for final typing issues

* fixes for pylint

* Update Documentation & Code Style

* uncomment pinecone tests

* added new params to docstrings

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* changes based on comments, updated errors and install

* Update Documentation & Code Style

* mypy

* implement simple filtering in pinecone mock

* typo

* typo in reverse

* account for missing meta key in filtering

* typo

* added metadata filtering to describe index

* added handling for users switching indexes in same doc store, and handling duplicate docs in write

* syntax tweaks

* added index option to document/embedding count calls

* labels implementation in progress

* added metadata fields to be indexed for pinecone tests

* further changes to mock

* WIP implementation of labels+multilabels

* switched to rely on labels namespace rather than filter

* simpler delete_labels

* label fixes, remove debug code

* Apply dostring fixes

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* pylint

* docs

* temporarily un-mock Pinecone

* Small Pinecone test suite

* pylint

* Add fake test key to pass the None check

* Add again fake test key to pass the None check

* Add Pinecone to default docstores and fix filters

* Fix field name

* Change field name

* Change field value

* Remove comments

* forgot to upgrade pyproject.toml

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
bogdankostic
b03de53716
Use random_sample instead of ndarray for random array (#3083) 2022-08-22 13:19:45 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering (#2988)
* ES filtering - allow exact list matching with field

typing fix

Update Documentation & Code Style

remove default hit limit in filtering queries

Update Documentation & Code Style

pytest es list eq filter

Update Documentation & Code Style

* review feedback

* fixed test

Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Igor Tarlinskiy
5b06658670
Forbid the key id from Documents to be written in WeaviateDocumentStore (#2846)
* Raise error upon duplicate document key found within meta info

* value error msg fix

* Update Documentation & Code Style

* Raise exception instead of asserting

* Update Documentation & Code Style

* add test
2022-08-12 17:50:54 +02:00
Dmitry Goryunov
da7836a931
feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995)
* Add embedding_dim to dc store

* Remove similarity from query params, it is not used

* Remove unused `return_embedding` parameter

* Remove unused param

* Update the documentation

* Update schemas

* Revert openapi changes

* Revert openapi changes

* Fix openapi

* Fix json schema

* Improve docstrings

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Improve logs

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update the docs

* Fix similarity

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:46:52 +02:00
tstadel
668fd548a6
Fix embeddings_field_supports_similarity of OpenSearchDocumentStore when creating index (#3030)
* fix embeddings_field_supports_similarity when creating index

* fix test
2022-08-12 11:19:59 +02:00
Zoltan Fedor
f4128d3581
Adding support for additional distance/similarity metrics for Weaviate (#3001)
* Adding support for additional distance metrics for Weaviate

Fixes #3000

* Updating the docs

* Fixing error texts

* Fixing issues raised by the review

* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`

* [EMPTY] Re-trigger CI

* Fixing things based on review

* [EMPTY] Re-trigger CI
2022-08-11 09:48:21 +02:00
James Briggs
5d4e3bd7ca
convert to set so not relying on correct order (#3015) 2022-08-10 12:57:31 +02:00
James Briggs
524c9b959d
switch label variables in test_labels (#3011) 2022-08-10 12:01:57 +02:00
Massimiliano Pippi
40d07c2038
Enable Opensearch unit tests in Windows CI (#2936)
* enable Opensearch unit tests under Win

* move unit tests into a dedicated job

* skip audio tests on missing dependencies

* avoid failing test collection when soundfile is not available

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-03 19:19:07 +02:00
Steven Haley
6b7d4a0514
Bug fix Weaviate document deletion (#2899)
* Bug fix Weaviate document deletion

If no filters param is passed in, then the original code retrieves *all* documents before then deleting by their IDs. There's no need for that, since we can delete by their IDs directly.

* Edit comment to clarify deletion and recreation

* Write unit tests for bug fix
2022-07-29 17:21:25 +02:00