16 Commits

Author SHA1 Message Date
ramgarg102
51f0a56e5d
delete_all_documents() replaced by delete_documents() (#1377)
* [UPDT] delete_all_documents() replaced by delete_documents()

* [UPDT] warning logs to be fixed

* [UPDT] delete_all_documents() renamed and the same method added

Co-authored-by: Ram Garg <ramgarg102@gmai.com>
2021-08-30 15:18:28 +02:00
vblagoje
02fc4c7783
Improve document stores unit test parametrization (#1202) 2021-06-22 16:08:23 +02:00
Branden Chan
aa6f768efa
Prevent merge of same questions on different documents during evaluation (#1119)
* Fix duplicate question in Reader.eval()

* Add duplicate question support in document store

* Support duplicate questions in retriever eval

* Update tutorial

* Rename key_tuple

* Change error message

* Add warning when more than 6 labels

* Allow for label grouping options

* Add support for aggregating by label meta

* Satisfy mypy

* Fix duplicate question in Reader.eval()

* Add duplicate question support in document store

* Support duplicate questions in retriever eval

* Update tutorial

* Rename key_tuple

* Change error message

* Add warning when more than 6 labels

* Allow for label grouping options

* Add support for aggregating by label meta

* Satisfy mypy

* Make label field flexible, add docstrings

* Satisfy mypy

* Fix failing tests

* Adjust docstring

* Fix tutorial

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-02 12:09:03 +02:00
Ikram Ali
b76ed4c5a4
Add options for handling duplicate documents (skip, fail, overwrite) (#1088)
* [document_stores] Duplicate document implmentation added for memorystore.

* [document_stores]duplicate documents implementation done for faiss store.

* [document_store] Duplicate document feature added for elasticsearch document store fixed #1069

* [document_store] Duplicate documents feature added for milvus document store and bug fixed in faiss document store fixed #1069

* [document_store] Code refactored fixed #1069

* [document_store]Test cases refactored.

* [document_store] mypy issue fixed.

* [test_case] faiss and milvus test case refactored to support duplicate documents implementation. fixed #1069

* [document_store] duplicate_documents_options code refactored.

* [document_store] Code refactored.
2021-05-25 13:30:06 +02:00
Ikram Ali
4ab1bc3c3e
Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() (#1063)
* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed #1037

* change 2nd level loop to docs. switch to tqdm.auto.

* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.

* [test_case]  Elasticsearch documentstore get_document_without_embedding_count() test case added.

* [document_stores] Add new bool arg in get_document_count() method and fixed #1082

* [document_stores] typo fixed #1082

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-21 14:18:07 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication (#1000)
* using text hash as id to prevent document duplication. Also providing a way customize it.

* Add latest docstring and tutorial changes

* Fixing duplicate value test when text is same

* Adding test for duplicate ids in document store

* Changing exception to generic Exception type

* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
Ikram Ali
a06e4450d1
Rename delete_all_documents() method to delete_documents() (#1047) 2021-05-10 13:37:08 +02:00
oryx1729
8c1e411380
Fix update_embeddings() for FAISSDocumentStore (#978) 2021-04-21 09:56:35 +02:00
Malte Pietsch
e641bff7a6
Allow more options for elasticsearch client (auth, multiple hosts) (#845)
* allow more options for elasticsearch client (auth, multiple hosts)

* Add latest docstring and tutorial changes

* fix mypy

* Add latest docstring and tutorial changes

* test client connection via ping()

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-19 14:29:59 +01:00
Malte Pietsch
47aae14efa relax assert precision of arrays 2021-02-15 14:52:13 +01:00
oryx1729
4059805d89
Fix ElasticsearchDocumentStore.query_by_embedding() (#823) 2021-02-12 14:57:06 +01:00
Tanay Soni
fd5c5dd23c
Introduce incremental updates for embeddings in document stores (#812) 2021-02-09 21:25:01 +01:00
Tanay Soni
b87dd244c1
Get metadata values for a key from Elasticsearch (#776) 2021-02-01 16:13:26 +01:00
Lalit Pagaria
9f7f95221f
Milvus integration (#771)
* Initial commit for Milvus integration

* Add latest docstring and tutorial changes

* Updating implementation of Milvus document store

* Add latest docstring and tutorial changes

* Adding tests and updating doc string

* Add latest docstring and tutorial changes

* Fixing issue caught by tests

* Addressing review comments

* Fixing mypy detected issue

* Fixing issue caught in test about sorting of vector ids

* fixing test

* Fixing generator test failure

* update docstrings

* Addressing review comments about multiple network call while fetching embedding from milvus server

* Add latest docstring and tutorial changes

* Ignoring mypy issue while converting vector_id to int

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-01-29 13:29:12 +01:00
Tanay Soni
d9f011da9a
Add flag for use of window queries in SQLDocumentStore (#768) 2021-01-25 12:54:34 +01:00
Tanay Soni
f0aa879a1c
Fix delete_all_documents for the SQLDocumentStore (#761) 2021-01-22 14:39:24 +01:00