mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-30 00:30:09 +00:00
Standardize similarity argument description (#1684)
* Standardize argument similarity argument description * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This commit is contained in:
parent
27793814cf
commit
4ca1937775
@ -200,7 +200,7 @@ A DocumentStore using Elasticsearch to store and query the documents for our sea
|
||||
If set to 'wait_for', continue only after changes are visible (slow, but safe).
|
||||
If set to 'false', continue directly (fast, but sometimes unintuitive behaviour when docs are not immediately available after ingestion).
|
||||
More info at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-refresh.html
|
||||
- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default sine it is
|
||||
- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default since it is
|
||||
more performant with DPR embeddings. 'cosine' is recommended if you are using a Sentence BERT model.
|
||||
- `timeout`: Number of seconds after which an ElasticSearch request times out.
|
||||
- `return_embedding`: To return document embedding
|
||||
@ -1419,9 +1419,7 @@ As a rule of thumb, we would see a 30% ~ 50% increase in the search performance
|
||||
Note that an overly large index_file_size value may cause failure to load a segment into the memory or graphics memory.
|
||||
(From https://milvus.io/docs/v1.0.0/performance_faq.md#How-can-I-get-the-best-performance-from-Milvus-through-setting-index_file_size)
|
||||
- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default and recommended for DPR embeddings.
|
||||
'cosine' is recommended for Sentence Transformers, but is not directly supported by Milvus.
|
||||
However, Haystack can normalize your embeddings and use `dot_product` to get the same results.
|
||||
See https://milvus.io/docs/v1.0.0/metric.md?Inner-product-(IP)`floating`.
|
||||
'cosine' is recommended for Sentence Transformers.
|
||||
- `index_type`: Type of approximate nearest neighbour (ANN) index used. The choice here determines your tradeoff between speed and accuracy.
|
||||
Some popular options:
|
||||
- FLAT (default): Exact method, slow
|
||||
@ -1712,6 +1710,7 @@ The current implementation is not supporting the storage of labels, so you canno
|
||||
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
|
||||
- `name_field`: Name of field that contains the title of the the doc
|
||||
- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default.
|
||||
'cosine' is recommended for Sentence Transformers.
|
||||
- `index_type`: Index type of any vector object defined in weaviate schema. The vector index type is pluggable.
|
||||
Currently, HSNW is only supported.
|
||||
See: https://www.semi.technology/developers/weaviate/current/more-resources/performance.html
|
||||
|
||||
@ -89,7 +89,7 @@ class ElasticsearchDocumentStore(BaseDocumentStore):
|
||||
If set to 'wait_for', continue only after changes are visible (slow, but safe).
|
||||
If set to 'false', continue directly (fast, but sometimes unintuitive behaviour when docs are not immediately available after ingestion).
|
||||
More info at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-refresh.html
|
||||
:param similarity: The similarity function used to compare document vectors. 'dot_product' is the default sine it is
|
||||
:param similarity: The similarity function used to compare document vectors. 'dot_product' is the default since it is
|
||||
more performant with DPR embeddings. 'cosine' is recommended if you are using a Sentence BERT model.
|
||||
:param timeout: Number of seconds after which an ElasticSearch request times out.
|
||||
:param return_embedding: To return document embedding
|
||||
|
||||
@ -70,9 +70,7 @@ class MilvusDocumentStore(SQLDocumentStore):
|
||||
Note that an overly large index_file_size value may cause failure to load a segment into the memory or graphics memory.
|
||||
(From https://milvus.io/docs/v1.0.0/performance_faq.md#How-can-I-get-the-best-performance-from-Milvus-through-setting-index_file_size)
|
||||
:param similarity: The similarity function used to compare document vectors. 'dot_product' is the default and recommended for DPR embeddings.
|
||||
'cosine' is recommended for Sentence Transformers, but is not directly supported by Milvus.
|
||||
However, Haystack can normalize your embeddings and use `dot_product` to get the same results.
|
||||
See https://milvus.io/docs/v1.0.0/metric.md?Inner-product-(IP)#floating.
|
||||
'cosine' is recommended for Sentence Transformers.
|
||||
:param index_type: Type of approximate nearest neighbour (ANN) index used. The choice here determines your tradeoff between speed and accuracy.
|
||||
Some popular options:
|
||||
- FLAT (default): Exact method, slow
|
||||
@ -213,7 +211,8 @@ class MilvusDocumentStore(SQLDocumentStore):
|
||||
for doc in document_batch:
|
||||
doc_ids.append(doc.id)
|
||||
if isinstance(doc.embedding, np.ndarray):
|
||||
if self.similarity=="cosine": self.normalize_embedding(doc.embedding)
|
||||
if self.similarity=="cosine":
|
||||
self.normalize_embedding(doc.embedding)
|
||||
embeddings.append(doc.embedding.tolist())
|
||||
elif isinstance(doc.embedding, list):
|
||||
if self.similarity=="cosine":
|
||||
|
||||
@ -69,6 +69,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
|
||||
If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
|
||||
:param name_field: Name of field that contains the title of the the doc
|
||||
:param similarity: The similarity function used to compare document vectors. 'dot_product' is the default.
|
||||
'cosine' is recommended for Sentence Transformers.
|
||||
:param index_type: Index type of any vector object defined in weaviate schema. The vector index type is pluggable.
|
||||
Currently, HSNW is only supported.
|
||||
See: https://www.semi.technology/developers/weaviate/current/more-resources/performance.html
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user