2025-10-27 17:26:17 +01:00

140 lines
5.8 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "MistralTextEmbedder"
id: mistraltextembedder
slug: "/mistraltextembedder"
description: "This component transforms a string into a vector using the Mistral API and models. Use it for embedding retrieval to transform your query into an embedding."
---
# MistralTextEmbedder
This component transforms a string into a vector using the Mistral API and models. Use it for embedding retrieval to transform your query into an embedding.
| | |
| --- | --- |
| **Most common position in a pipeline** | Before an embedding [Retriever](/docs/pipeline-components/retrievers.mdx) in a query/RAG pipeline |
| **Mandatory init variables** | "api_key": The Mistral API key. Can be set with `MISTRAL_API_KEY` env var. |
| **Mandatory run variables** | “text”: A string |
| **Output variables** | “embedding”: A list of float numbers (vectors) <br /> <br />“meta”: A dictionary of metadata strings |
| **API reference** | [Mistral](/reference/integrations-mistral) |
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mistral |
Use `MistalTextEmbedder` to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`MistralDocumentEmbedder`](mistraldocumentembedder.mdx), which enriches the document with the computed embedding, also known as vector.
## Overview
`MistralTextEmbedder` transforms a string into a vector that captures its semantics using a Mistral embedding model.
The component currently supports the `mistral-embed` embedding model. The list of all supported models can be found in Mistrals [embedding models documentation](https://docs.mistral.ai/platform/endpoints/#embedding-models).
To start using this integration with Haystack, install it with:
```shell
pip install mistral-haystack
```
`MistralTextEmbedder` needs a Mistral API key to work. It uses a `MISTRAL_API_KEY` environment variable by default. Otherwise, you can pass an API key at initialization with `api_key`:
```python
embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")
```
## Usage
### On its own
Remember to set the`MISTRAL_API_KEY` as an environment variable first or pass it in directly.
Here is how you can use the component on its own:
```python
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")
result = embedder.run(text="How can I ise the Mistral embedding models with Haystack?")
print(result['embedding'])
## [-0.0015687942504882812, 0.052154541015625, 0.037109375...]
```
### In a pipeline
Below is an example of the `MistralTextEmbedder` in a document search pipeline. We are building this pipeline on top of an `InMemoryDocumentStore` where we index the contents of two URLs.
```python
from haystack import Document, Pipeline
from haystack.utils import Secret
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
## Initialize document store
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
## Indexing components
fetcher = LinkContentFetcher()
converter = HTMLToDocument()
embedder = MistralDocumentEmbedder()
writer = DocumentWriter(document_store=document_store)
indexing = Pipeline()
indexing.add_component(name="fetcher", instance=fetcher)
indexing.add_component(name="converter", instance=converter)
indexing.add_component(name="embedder", instance=embedder)
indexing.add_component(name="writer", instance=writer)
indexing.connect("fetcher", "converter")
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")
indexing.run(data={"fetcher": {"urls": ["https://docs.mistral.ai/self-deployment/cloudflare/",
"https://docs.mistral.ai/platform/endpoints/"]}})
## Retrieval components
text_embedder = MistralTextEmbedder()
retriever = InMemoryEmbeddingRetriever(document_store=document_store)
## Define prompt template
prompt_template = [
ChatMessage.from_system("You are a helpful assistant."),
ChatMessage.from_user(
"Given the retrieved documents, answer the question.\nDocuments:\n"
"{% for document in documents %}{{ document.content }}{% endfor %}\n"
"Question: {{ query }}\nAnswer:"
)
]
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"query", "documents"})
llm = OpenAIChatGenerator(model="gpt-4o-mini", api_key=Secret.from_token("<your-api-key>"))
doc_search = Pipeline()
doc_search.add_component("text_embedder", text_embedder)
doc_search.add_component("retriever", retriever)
doc_search.add_component("prompt_builder", prompt_builder)
doc_search.add_component("llm", llm)
doc_search.connect("text_embedder.embedding", "retriever.query_embedding")
doc_search.connect("retriever.documents", "prompt_builder.documents")
doc_search.connect("prompt_builder.messages", "llm.messages")
query = "How can I deploy Mistral models with Cloudflare?"
result = doc_search.run(
{
"text_embedder": {"text": query},
"retriever": {"top_k": 1},
"prompt_builder": {"query": query}
}
)
print(result["llm"]["replies"])
```