haystack/docs-website/docs/pipeline-components/retrievers/weaviateembeddingretriever.mdx
Daria Fokina 510d063612
style(docs): params as inline code (#10017)
* params as inline code

* more params

* even more params

* last params
2025-11-05 14:49:38 +01:00

117 lines
6.0 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "WeaviateEmbeddingRetriever"
id: weaviateembeddingretriever
slug: "/weaviateembeddingretriever"
description: "This is an embedding Retriever compatible with the Weaviate Document Store."
---
# WeaviateEmbeddingRetriever
This is an embedding Retriever compatible with the Weaviate Document Store.
<div className="key-value-table">
| | |
| --- | --- |
| **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline |
| **Mandatory init variables** | `document_store`: An instance of a [WeaviateDocumentStore](../../document-stores/weaviatedocumentstore.mdx) |
| **Mandatory run variables** | `query_embedding`: A list of floats |
| **Output variables** | `documents`: A list of documents |
| **API reference** | [Weaviate](/reference/integrations-weaviate) |
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/weaviate |
</div>
## Overview
The `WeaviateEmbeddingRetriever` is an embedding-based Retriever compatible with the [`WeaviateDocumentStore`](../../document-stores/weaviatedocumentstore.mdx). It compares the query and Document embeddings and fetches the Documents most relevant to the query from the `WeaviateDocumentStore` based on the outcome.
### Parameters
When using the `WeaviateEmbeddingRetriever` in your NLP system, ensure the query and Document [embeddings](../embedders.mdx) are available. You can do so by adding a Document Embedder to your indexing Pipeline and a Text Embedder to your query Pipeline.
In addition to the `query_embedding`, the `WeaviateEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space.
You can also specify `distance`, the maximum allowed distance between embeddings, and `certainty`, the normalized distance between the result items and the search embedding. The behavior of `distance` depends on the Collections distance metric used. See the [official Weaviate documentation](https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables) for more information.
The embedding similarity function depends on the vectorizer used in the `WeaviateDocumentStore` collection. Check out the [official Weaviate documentation](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules) to see all the supported vectorizers.
## Usage
### Installation
To start using Weaviate with Haystack, install the package with:
```shell
pip install weaviate-haystack
```
### On its own
This Retriever needs an instance of `WeaviateDocumentStore` and indexed Documents to run.
```python
from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore
from haystack_integrations.components.retrievers.weaviate import WeaviateEmbeddingRetriever
document_store = WeaviateDocumentStore(url="http://localhost:8080")
retriever = WeaviateEmbeddingRetriever(document_store=document_store)
## using a fake vector to keep the example simple
retriever.run(query_embedding=[0.1]*768)
```
### In a Pipeline
```python
from haystack.document_stores.types import DuplicatePolicy
from haystack import Document
from haystack import Pipeline
from haystack.components.embedders import (
SentenceTransformersTextEmbedder,
SentenceTransformersDocumentEmbedder,
)
from haystack_integrations.document_stores.weaviate.document_store import (
WeaviateDocumentStore,
)
from haystack_integrations.components.retrievers.weaviate import (
WeaviateEmbeddingRetriever,
)
document_store = WeaviateDocumentStore(url="http://localhost:8080")
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(
content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."
),
Document(
content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves."
),
]
document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)
document_store.write_documents(
documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE
)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
"retriever", WeaviateEmbeddingRetriever(document_store=document_store)
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "How many languages are there?"
result = query_pipeline.run({"text_embedder": {"text": query}})
print(result["retriever"]["documents"][0])
```