Standardize similarity argument description (#1684)

* Standardize argument similarity argument description * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-12-30 00:30:09 +00:00 · 2021-11-02 14:53:26 +01:00 · 2021-11-02 14:53:26 +01:00 · 4ca1937775
commit 4ca1937775
parent 27793814cf
4 changed files with 8 additions and 9 deletions
--- a/docs/_src/api/api/document_store.md
+++ b/docs/_src/api/api/document_store.md
@ -200,7 +200,7 @@ A DocumentStore using Elasticsearch to store and query the documents for our sea
                     If set to 'wait_for', continue only after changes are visible (slow, but safe).
                     If set to 'false', continue directly (fast, but sometimes unintuitive behaviour when docs are not immediately available after ingestion).
                     More info at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-refresh.html
- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default sine it is
+- `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default since it is
                   more performant with DPR embeddings. 'cosine' is recommended if you are using a Sentence BERT model.
 - `timeout`: Number of seconds after which an ElasticSearch request times out.
 - `return_embedding`: To return document embedding
@ -1419,9 +1419,7 @@ As a rule of thumb, we would see a 30% ~ 50% increase in the search performance
 Note that an overly large index_file_size value may cause failure to load a segment into the memory or graphics memory.
 (From https://milvus.io/docs/v1.0.0/performance_faq.md#How-can-I-get-the-best-performance-from-Milvus-through-setting-index_file_size)
 - `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default and recommended for DPR embeddings.
-                   'cosine' is recommended for Sentence Transformers, but is not directly supported by Milvus.
-                   However, Haystack can normalize your embeddings and use `dot_product` to get the same results.
-                   See https://milvus.io/docs/v1.0.0/metric.md?Inner-product-(IP)`floating`.
+                   'cosine' is recommended for Sentence Transformers.
 - `index_type`: Type of approximate nearest neighbour (ANN) index used. The choice here determines your tradeoff between speed and accuracy.
                   Some popular options:
                   - FLAT (default): Exact method, slow
@ -1712,6 +1710,7 @@ The current implementation is not supporting the storage of labels, so you canno
                   If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
 - `name_field`: Name of field that contains the title of the the doc
 - `similarity`: The similarity function used to compare document vectors. 'dot_product' is the default.
+                   'cosine' is recommended for Sentence Transformers.
 - `index_type`: Index type of any vector object defined in weaviate schema. The vector index type is pluggable.
                   Currently, HSNW is only supported.
                   See: https://www.semi.technology/developers/weaviate/current/more-resources/performance.html
--- a/haystack/document_stores/elasticsearch.py
+++ b/haystack/document_stores/elasticsearch.py
@ -89,7 +89,7 @@ class ElasticsearchDocumentStore(BaseDocumentStore):
                             If set to 'wait_for', continue only after changes are visible (slow, but safe).
                             If set to 'false', continue directly (fast, but sometimes unintuitive behaviour when docs are not immediately available after ingestion).
                             More info at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-refresh.html
-        :param similarity: The similarity function used to compare document vectors. 'dot_product' is the default sine it is
+        :param similarity: The similarity function used to compare document vectors. 'dot_product' is the default since it is
                           more performant with DPR embeddings. 'cosine' is recommended if you are using a Sentence BERT model.
        :param timeout: Number of seconds after which an ElasticSearch request times out.
        :param return_embedding: To return document embedding
--- a/haystack/document_stores/milvus.py
+++ b/haystack/document_stores/milvus.py
@ -70,9 +70,7 @@ class MilvusDocumentStore(SQLDocumentStore):
         Note that an overly large index_file_size value may cause failure to load a segment into the memory or graphics memory.
         (From https://milvus.io/docs/v1.0.0/performance_faq.md#How-can-I-get-the-best-performance-from-Milvus-through-setting-index_file_size)
        :param similarity: The similarity function used to compare document vectors. 'dot_product' is the default and recommended for DPR embeddings.
-                           'cosine' is recommended for Sentence Transformers, but is not directly supported by Milvus.
-                           However, Haystack can normalize your embeddings and use `dot_product` to get the same results.
-                           See https://milvus.io/docs/v1.0.0/metric.md?Inner-product-(IP)#floating.
+                           'cosine' is recommended for Sentence Transformers.
        :param index_type: Type of approximate nearest neighbour (ANN) index used. The choice here determines your tradeoff between speed and accuracy.
                           Some popular options:
                           - FLAT (default): Exact method, slow
@ -213,7 +211,8 @@ class MilvusDocumentStore(SQLDocumentStore):
                    for doc in document_batch:
                        doc_ids.append(doc.id)
                        if isinstance(doc.embedding, np.ndarray):
-                            if self.similarity=="cosine": self.normalize_embedding(doc.embedding)
+                            if self.similarity=="cosine":
+                                self.normalize_embedding(doc.embedding)
                            embeddings.append(doc.embedding.tolist())
                        elif isinstance(doc.embedding, list):
                            if self.similarity=="cosine":
--- a/haystack/document_stores/weaviate.py
+++ b/haystack/document_stores/weaviate.py
@ -69,6 +69,7 @@ class WeaviateDocumentStore(BaseDocumentStore):
                           If no Reader is used (e.g. in FAQ-Style QA) the plain content of this field will just be returned.
        :param name_field: Name of field that contains the title of the the doc
        :param similarity: The similarity function used to compare document vectors. 'dot_product' is the default.
+                           'cosine' is recommended for Sentence Transformers.
        :param index_type: Index type of any vector object defined in weaviate schema. The vector index type is pluggable.
                           Currently, HSNW is only supported.
                           See: https://www.semi.technology/developers/weaviate/current/more-resources/performance.html