tstadel
6ca88bfd23
fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field ( #3662 )
...
* fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field
* fix pylint
* add tests
* fix mypy
* fix merge
* format
* fix pylint
* move tests to SearchEngineDocumentStoreTestAbstract
* move missed constants
* add mocked_document_store fixture to TestElasticsearchDocumentStore
* fix mocked_document_store
* fix get_all_documents tests for elasticsearch>=7.16
* fix tests
* fix tests try 2
2023-01-09 11:58:23 +01:00
Julian Risch
0c2d13f1b8
bug: skip validating empty embeddings ( #3774 )
...
* skip validating empty embeddings
* skip batches without embeddings to update
* add unit test with mocked retriever
2023-01-05 15:13:57 +01:00
tstadel
6c067b2b4f
feat: make score_script
first class citizen via knn_engine
param ( #3284 )
...
* OpenSearchDocumentStore: make score_script accessible via knn_engine
* blacken
* fix tests
* fix format
* fix naming of 'score_script' consistently
* fix tests
* fix test
* fix ef_search tests
* always validate index
* improve clone_embedding_field
* fix pylint
* reformat
* remove port
* update tests
* set no_implicit_optional = false
* fix myp
* fix test
* refactorings
* reformat
* fix and refactor tests
* better tests
* create search_field mappings
* remove no_implicit_optional = false
* skip validation for custom mapping
* format
* Apply suggestions from docs code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply tougher suggestions from code review
* fix messages
* fix typos
* update tests
* Update haystack/document_stores/opensearch.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* fix tests
* fix ef_search validation
* add test for ef_search nmslib
* fix assert_not_called
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-27 15:24:31 +01:00
Zoltan Fedor
e143f7cc36
Fixing broken BM25 support with Weaviate - fixes #3720 ( #3723 )
...
* Fixing broken BM25 support with Weaviate - fixes #3720
Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.
Please see more under issue #3720 .
* Fixing mypy issue - method signature wasn't matching the base class
* Mypy related test fix
Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.
* Adding a note regarding an upcomming fix in Weaviate v1.17.0
* Apply suggestions from code review
* revert
* [EMPTY] Re-trigger CI
2022-12-19 17:24:46 +01:00
James Briggs
520b23ec1b
fix: pinecone metadata format ( #3660 )
...
* fix for multilevel metadata dictionaries
* add metadata dict formating to update function
* typing
* added check for labels meta
* added more info to input parameters
* added test for multilayer metadata
* removed todo
2022-12-13 10:11:24 +01:00
tstadel
600dc2d611
refactor: filters type ( #3682 )
...
* consolidate filters type
* remove unnecessary optionals
* fix mypy
* fix pylint
* fix pylint
* move FilterType to schema
* remove Optional from FilterType
* move to Dict[str, Any]
* Revert "move to Dict[str, Any]"
This reverts commit e8c561bb7885949e19825697fa4c469945f90ce5.
* fix mypy
* fix pylint
* revert isort changes in elasticsearch
* remove todos in milvus.py
* remove todos in sql.py
* add aggregate_labels tests
* consolidate aggregate_labels tests
* remove superfluous type todos
* remove ALL superfluous #todos
2022-12-12 14:04:29 +01:00
tstadel
c1c1c97bb2
feat: add query_by_embedding_batch ( #3546 )
...
* add query_by_embedding_batch
* fix mypy
* fix pylint
* add test
* move query_by_embedding_batch to search_engine
* fix and add tests
* fix pylint
* remove Retriever query logs
* add test for multimodal batch retrieval
* allow for np.ndarray
2022-12-08 08:28:43 +01:00
Sara Zan
fc89f6ea74
fix: revert Weaviate query with filters and improve tests ( #3646 )
...
* revert weaviate query with filters and improve tests
* pylint
* upgrade weaviate container
* use latest docker tag
* fix text
* fix text
2022-12-06 14:48:58 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class ( #3637 )
...
* move more tests to the base class
* skip tests where unsupported
* do not pass index label explicitly
* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Sara Zan
eb7b9452d0
refactor: Weaviate query with filters ( #3628 )
2022-11-28 12:26:33 +01:00
Massimiliano Pippi
c6890c3e86
chore: remove redundant tests ( #3620 )
...
* remove redundant tests
* skip test on win
* fix missing import
* revert mistake
* revert
2022-11-25 20:55:21 +05:30
Massimiliano Pippi
a15af7f8c3
refactor: Move InMemoryDocumentStore
tests to their own class ( #3614 )
...
* move tests to their own class
* move more tests
* add specific job
* fix test
* Update test/document_stores/test_memory.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-23 15:33:46 +01:00
Stefano Fiorucci
3040e59c63
feat: add support for BM25Retriever
in InMemoryDocumentStore
( #3561 )
...
* very first draft
* implement query and query_batch
* add more bm25 parameters
* add rank_bm25 dependency
* fix mypy
* remove tokenizer callable parameter
* remove unused import
* only json serializable attributes
* try to fix: pylint too-many-public-methods / R0904
* bm25 attribute always present
* convert errors into warnings to make the tutorial 1 work
* add docstrings; tests
* try to make tests run
* better docstrings; revert not running tests
* some suggestions from review
* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25
* exclude tests with filters
* change elasticsearch to bm25 retriever in test_summarizer
* add tests
* try to improve tests
* better type hint
* adapt test_table_text_retriever_embedding
* handle non-textual docs
* query only textual documents
2022-11-22 09:24:52 +01:00
Massimiliano Pippi
ea75e2aab5
feat: store metadata using JSON in SQLDocumentStore ( #3547 )
...
* add warnings
* make the field cachable
* review comment
2022-11-18 08:26:19 +01:00
Massimiliano Pippi
1399681c81
move milvus tests to their own module ( #3596 )
2022-11-17 16:22:02 +01:00
Stefano Fiorucci
dc26e6d43e
fix: Flatten DocumentClassifier
output in SQLDocumentStore
; remove _sql_session_rollback
hack in tests ( #3273 )
...
* first draft
* fix
* fix
* move test to test_sql
2022-11-16 12:20:57 +01:00
Massimiliano Pippi
ba75d39029
fix: discard metadata fields if not set in Weaviate ( #3578 )
...
* fix weaviate bug in returning embeddings and setting empty meta fields
* review comment
2022-11-15 22:02:53 +01:00
tstadel
6ce2d296f4
fix: Elasticsearch / OpenSearch brownfield function does not incorporate meta ( #3572 )
...
* fix meta bug
* adjust brownfield test
2022-11-15 12:13:21 +01:00
Stefano Fiorucci
9de56b0283
fix: write metadata to SQL Document Store when duplicate_documents!="overwrite" ( #3548 )
...
* add_all fixes the bug
* improved test
2022-11-15 10:04:04 +01:00
Massimiliano Pippi
6a48ace9b9
BREAKING CHANGE: remove Milvus1DocumentStore along with support for Milvus < 2.x ( #3552 )
...
* remove milvus1
* leftover
* revert deprecation process
2022-11-15 09:54:55 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests ( #3555 )
...
* add pytest option to unmock pinecone
* first try
* handle missing answer
* fix labels metadata
* more tests
* adapt workflow
* typo
* address review comments
2022-11-14 15:19:15 +01:00
Massimiliano Pippi
4dfddf0d10
refactor: Refactor Weaviate tests ( #3541 )
...
* refactor tests
* fix job
* revert
* revert
* revert
* use latest weaviate
* fix abstract methods signatures
* pass class_name to all the CRUD methods
* finish moving all the tests
* bump weaviate version
* raise, don't pass
2022-11-14 09:57:30 +01:00
Massimiliano Pippi
3319ef6d1c
refactor: refactor FAISS tests ( #3537 )
...
* fix write docs behaviour
* refactor FAISS tests
* do not remove the sqlite db
* try
* remove extra slash
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_faiss.py
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-11-08 16:37:01 +01:00
Massimiliano Pippi
255072d8d5
refactor: move dC tests to their own module and job ( #3529 )
...
* move dC tests to their own module and job
* restore global var
* revert
2022-11-04 17:05:10 +01:00
Massimiliano Pippi
2bb81331b7
feat: add SQLDocumentStore tests ( #3517 )
...
* port SQL tests
* cleanup document_store_tests.py from sql tests
* leftover
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* review comments
* Update test/document_stores/test_base.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-11-04 09:24:19 +01:00
Stefano Fiorucci
4b0894f4c2
fix: support long texts for labels in ElasticsearchDocumentStore
( #3346 )
2022-11-02 11:16:36 +01:00
Massimiliano Pippi
b694c7b5cb
Document Store test refactoring ( #3449 )
...
* add new marker
* start using test hierarchies
* move ES tests into their own class
* refactor test workflow
* job steps
* add more tests
* move more tests
* more tests
* test labels
* add more tests
* Update tests.yml
* Update tests.yml
* fix
* typo
* fix es image tag
* map es ports
* try
* fix
* default port
* remove opensearch from the markers sorcery
* revert
* skip new tests in old jobs
* skip opensearch_faiss
2022-10-31 15:30:14 +01:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer
attribute ( #3411 )
...
* always run validation
* update schemas
* no_answer as a property. break things!
* forgotten schema
* fix
* update openapi
* removed my unnecessary test
* fix sql document store
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Mayank Jobanputra
d48577b4e7
bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings
workflow ( #3368 )
...
* Removed explicit passage formatting by name field
* passing correct input type for embedding the docs
* Updated test, updated similarity scores and added results
* changed expected input to embed method
2022-10-25 14:52:05 +05:30
Stefano Fiorucci
7290196c32
fix: allow same vector_id
in different indexes for SQL-based Document stores ( #3383 )
...
* fix_multiple_indexes
* improve test names
2022-10-14 09:55:56 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 ( #3318 )
...
* bump elastic to 7.16.2+
* decouple Elasticsearch and Opensearch
use method override instead of func variables
fix mypy
default value
fix broken tests
update schema
* relax version pin
* rename the base class
* rename module
* fix import order
* do not run the new tests in the old job
* remove outdated TODO
2022-10-13 11:53:27 +02:00
tstadel
b84a6b1716
fix: opensearch script score with filters ( #3321 )
...
* fix opensearch script score filters
* add comment
* add integration test
* update schema
2022-10-06 15:41:29 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine ( #3217 )
...
* support cosine similiarity with faiss
* update docs
* update api docs
* fix tests
* Revert "update api docs"
This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.
* fix api docs
* collapse test
* rename similairity to space_type mappings
* only normalize for faiss
* fix merge
* fix docs normalization
* get rid of List[np.array]
* update docs
* fix tests and tutorials
* fix mypy
* fix mypy
* fix mypy again
* again mypy
* blacken
* update tutorial 4 docs
* fix embeddingretriever
* fix faiss
* move dense specific logic to DenseRetriever
* fix mypy
* cosine tests for all documents stores
* fix pinecone
* add docstring
* docstring corrections
* update docs
* add integration test marker
* docstrings update
* update docs
* fix typo
* update docs
* fix MockDenseRetriever
* run integration tests for all documentstores
* fix test_update_embeddings_cosine_similarity
* fix faiss tests not running
* blacken
* make test_cosine_sanity_check integration test
* split PR
* update docs
* manually revert tutorial doc change
* Fix embedding type
* set integration marker correctly
* make BaseDocumentStore.normalize_embedding static
* format
* fix handling of opensearch_faiss param
* fix merge
* add DenseRetriever typing
* organize imports in conftest.py
* organize imports in conftest.py (2)
* fix DenseRetriever import
* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running ( #3263 )
...
* fix milvus and faiss tests not running
* fix schema manually
* fix test_dpr_embedding test for milvus
* pip freeze on milvus tests
* fix milvus1 tests being executed: fix all_doc_stores order
* Revert "pip freeze on milvus tests"
This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.
* make infer_required_doc_store more robust
* don't skip tests without docstore requirements
* use markers for docstore tests
2022-09-22 17:46:49 +02:00
tstadel
b10e2c392e
chore: add DenseRetriever
abstraction ( #3252 )
...
* support cosine similiarity with faiss
* update docs
* update api docs
* fix tests
* Revert "update api docs"
This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.
* fix api docs
* collapse test
* rename similairity to space_type mappings
* only normalize for faiss
* fix merge
* fix docs normalization
* get rid of List[np.array]
* update docs
* fix tests and tutorials
* fix mypy
* fix mypy
* fix mypy again
* again mypy
* blacken
* update tutorial 4 docs
* fix embeddingretriever
* fix faiss
* move dense specific logic to DenseRetriever
* fix mypy
* cosine tests for all documents stores
* fix pinecone
* add docstring
* docstring corrections
* update docs
* add integration test marker
* docstrings update
* update docs
* fix typo
* update docs
* fix MockDenseRetriever
* run integration tests for all documentstores
* fix test_update_embeddings_cosine_similarity
* fix faiss tests not running
* blacken
* make test_cosine_sanity_check integration test
* update docs
* fix imports
* import DenseRetriever normally
* update docs
* fix deepcopy of documents
* update schema
* Revert "update schema"
This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f.
* fix schema for ci manually
2022-09-21 19:08:54 +02:00
Stefano Fiorucci
89247b804c
refactor: make TransformersDocumentClassifier
output consistent between different types of classification ( #3224 )
...
* make output consistent
* make output consistent
* added tests for details
* better tests
* Update test_document_classifier.py
* make black happy
* Update test_document_classifier.py
* Update test_document_classifier.py
2022-09-21 13:16:03 +02:00
Kristof Herrmann
da1cc577ae
feat: exponential backoff with exp decreasing batch size for opensearch client ( #3194 )
...
* Validate custom_mapping properly as an object
* Remove related test
* black
* feat: exponential backoff with exp dec batch size
* added docstring and split doc lsit
* fix
* fix mypy
* fix
* catch generic exception
* added test
* mypy ignore
* fixed no attribute
* added test
* added tests
* revert strange merge conflicts
* revert merge conflict again
* Update haystack/document_stores/elasticsearch.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* done
* adjust test
* remove not required caplog
* fixed comments
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-09-13 14:30:30 +01:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index ( #3101 )
...
* Add check for mapping for existing indices
* Add test
* Check if "method" field exists
2022-08-25 17:33:26 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 ( #3029 )
...
* support faiss hnsw
* blacken
* update docs
* improve similarity check
* add tests
* update schema
* set ef_search param correctly
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* regenerate docs
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db ( #2749 )
...
* update to PineconeDocumentStore to remove dependency on SQL db
* Update Documentation & Code Style
* typing fixes
* Update Documentation & Code Style
* fixed embedding generator to yield Documents
* Update Documentation & Code Style
* fixes for final typing issues
* fixes for pylint
* Update Documentation & Code Style
* uncomment pinecone tests
* added new params to docstrings
* Update Documentation & Code Style
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* changes based on comments, updated errors and install
* Update Documentation & Code Style
* mypy
* implement simple filtering in pinecone mock
* typo
* typo in reverse
* account for missing meta key in filtering
* typo
* added metadata filtering to describe index
* added handling for users switching indexes in same doc store, and handling duplicate docs in write
* syntax tweaks
* added index option to document/embedding count calls
* labels implementation in progress
* added metadata fields to be indexed for pinecone tests
* further changes to mock
* WIP implementation of labels+multilabels
* switched to rely on labels namespace rather than filter
* simpler delete_labels
* label fixes, remove debug code
* Apply dostring fixes
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* pylint
* docs
* temporarily un-mock Pinecone
* Small Pinecone test suite
* pylint
* Add fake test key to pass the None check
* Add again fake test key to pass the None check
* Add Pinecone to default docstores and fix filters
* Fix field name
* Change field name
* Change field value
* Remove comments
* forgot to upgrade pyproject.toml
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
bogdankostic
b03de53716
Use random_sample
instead of ndarray
for random array ( #3083 )
2022-08-22 13:19:45 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering ( #2988 )
...
* ES filtering - allow exact list matching with field
typing fix
Update Documentation & Code Style
remove default hit limit in filtering queries
Update Documentation & Code Style
pytest es list eq filter
Update Documentation & Code Style
* review feedback
* fixed test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Igor Tarlinskiy
5b06658670
Forbid the key id
from Document
s to be written in WeaviateDocumentStore
( #2846 )
...
* Raise error upon duplicate document key found within meta info
* value error msg fix
* Update Documentation & Code Style
* Raise exception instead of asserting
* Update Documentation & Code Style
* add test
2022-08-12 17:50:54 +02:00
Dmitry Goryunov
da7836a931
feat: Support embedding dimensions on DeepsetCloudDocumentStore ( #2995 )
...
* Add embedding_dim to dc store
* Remove similarity from query params, it is not used
* Remove unused `return_embedding` parameter
* Remove unused param
* Update the documentation
* Update schemas
* Revert openapi changes
* Revert openapi changes
* Fix openapi
* Fix json schema
* Improve docstrings
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Improve logs
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update the docs
* Fix similarity
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:46:52 +02:00
tstadel
668fd548a6
Fix embeddings_field_supports_similarity
of OpenSearchDocumentStore
when creating index ( #3030 )
...
* fix embeddings_field_supports_similarity when creating index
* fix test
2022-08-12 11:19:59 +02:00
Zoltan Fedor
f4128d3581
Adding support for additional distance/similarity metrics for Weaviate ( #3001 )
...
* Adding support for additional distance metrics for Weaviate
Fixes #3000
* Updating the docs
* Fixing error texts
* Fixing issues raised by the review
* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`
* [EMPTY] Re-trigger CI
* Fixing things based on review
* [EMPTY] Re-trigger CI
2022-08-11 09:48:21 +02:00
James Briggs
5d4e3bd7ca
convert to set so not relying on correct order ( #3015 )
2022-08-10 12:57:31 +02:00
James Briggs
524c9b959d
switch label variables in test_labels ( #3011 )
2022-08-10 12:01:57 +02:00
Massimiliano Pippi
40d07c2038
Enable Opensearch unit tests in Windows CI ( #2936 )
...
* enable Opensearch unit tests under Win
* move unit tests into a dedicated job
* skip audio tests on missing dependencies
* avoid failing test collection when soundfile is not available
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-03 19:19:07 +02:00
Steven Haley
6b7d4a0514
Bug fix Weaviate document deletion ( #2899 )
...
* Bug fix Weaviate document deletion
If no filters param is passed in, then the original code retrieves *all* documents before then deleting by their IDs. There's no need for that, since we can delete by their IDs directly.
* Edit comment to clarify deletion and recreation
* Write unit tests for bug fix
2022-07-29 17:21:25 +02:00