mirror of
https://github.com/deepset-ai/haystack.git
synced 2026-01-31 20:13:05 +00:00
140 lines
5.8 KiB
Plaintext
140 lines
5.8 KiB
Plaintext
---
|
||
title: "MistralTextEmbedder"
|
||
id: mistraltextembedder
|
||
slug: "/mistraltextembedder"
|
||
description: "This component transforms a string into a vector using the Mistral API and models. Use it for embedding retrieval to transform your query into an embedding."
|
||
---
|
||
|
||
# MistralTextEmbedder
|
||
|
||
This component transforms a string into a vector using the Mistral API and models. Use it for embedding retrieval to transform your query into an embedding.
|
||
|
||
| | |
|
||
| --- | --- |
|
||
| **Most common position in a pipeline** | Before an embedding [Retriever](/docs/pipeline-components/retrievers.mdx) in a query/RAG pipeline |
|
||
| **Mandatory init variables** | "api_key": The Mistral API key. Can be set with `MISTRAL_API_KEY` env var. |
|
||
| **Mandatory run variables** | “text”: A string |
|
||
| **Output variables** | “embedding”: A list of float numbers (vectors) <br /> <br />“meta”: A dictionary of metadata strings |
|
||
| **API reference** | [Mistral](/reference/integrations-mistral) |
|
||
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mistral |
|
||
|
||
Use `MistalTextEmbedder` to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`MistralDocumentEmbedder`](mistraldocumentembedder.mdx), which enriches the document with the computed embedding, also known as vector.
|
||
|
||
## Overview
|
||
|
||
`MistralTextEmbedder` transforms a string into a vector that captures its semantics using a Mistral embedding model.
|
||
|
||
The component currently supports the `mistral-embed` embedding model. The list of all supported models can be found in Mistral’s [embedding models documentation](https://docs.mistral.ai/platform/endpoints/#embedding-models).
|
||
|
||
To start using this integration with Haystack, install it with:
|
||
|
||
```shell
|
||
pip install mistral-haystack
|
||
```
|
||
|
||
`MistralTextEmbedder` needs a Mistral API key to work. It uses a `MISTRAL_API_KEY` environment variable by default. Otherwise, you can pass an API key at initialization with `api_key`:
|
||
|
||
```python
|
||
embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")
|
||
```
|
||
|
||
## Usage
|
||
|
||
### On its own
|
||
|
||
Remember to set the`MISTRAL_API_KEY` as an environment variable first or pass it in directly.
|
||
|
||
Here is how you can use the component on its own:
|
||
|
||
```python
|
||
|
||
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
|
||
|
||
embedder = MistralTextEmbedder(api_key=Secret.from_token("<your-api-key>"), model="mistral-embed")
|
||
|
||
result = embedder.run(text="How can I ise the Mistral embedding models with Haystack?")
|
||
|
||
print(result['embedding'])
|
||
## [-0.0015687942504882812, 0.052154541015625, 0.037109375...]
|
||
```
|
||
|
||
### In a pipeline
|
||
|
||
Below is an example of the `MistralTextEmbedder` in a document search pipeline. We are building this pipeline on top of an `InMemoryDocumentStore` where we index the contents of two URLs.
|
||
|
||
```python
|
||
from haystack import Document, Pipeline
|
||
from haystack.utils import Secret
|
||
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
|
||
from haystack.components.fetchers import LinkContentFetcher
|
||
from haystack.components.converters import HTMLToDocument
|
||
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
|
||
from haystack.components.writers import DocumentWriter
|
||
from haystack.document_stores.in_memory import InMemoryDocumentStore
|
||
from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
|
||
from haystack_integrations.components.embedders.mistral.text_embedder import MistralTextEmbedder
|
||
from haystack.components.generators.chat import OpenAIChatGenerator
|
||
from haystack.dataclasses import ChatMessage
|
||
|
||
## Initialize document store
|
||
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
|
||
|
||
## Indexing components
|
||
fetcher = LinkContentFetcher()
|
||
converter = HTMLToDocument()
|
||
embedder = MistralDocumentEmbedder()
|
||
writer = DocumentWriter(document_store=document_store)
|
||
|
||
indexing = Pipeline()
|
||
indexing.add_component(name="fetcher", instance=fetcher)
|
||
indexing.add_component(name="converter", instance=converter)
|
||
indexing.add_component(name="embedder", instance=embedder)
|
||
indexing.add_component(name="writer", instance=writer)
|
||
|
||
indexing.connect("fetcher", "converter")
|
||
indexing.connect("converter", "embedder")
|
||
indexing.connect("embedder", "writer")
|
||
|
||
indexing.run(data={"fetcher": {"urls": ["https://docs.mistral.ai/self-deployment/cloudflare/",
|
||
"https://docs.mistral.ai/platform/endpoints/"]}})
|
||
|
||
## Retrieval components
|
||
text_embedder = MistralTextEmbedder()
|
||
retriever = InMemoryEmbeddingRetriever(document_store=document_store)
|
||
|
||
## Define prompt template
|
||
prompt_template = [
|
||
ChatMessage.from_system("You are a helpful assistant."),
|
||
ChatMessage.from_user(
|
||
"Given the retrieved documents, answer the question.\nDocuments:\n"
|
||
"{% for document in documents %}{{ document.content }}{% endfor %}\n"
|
||
"Question: {{ query }}\nAnswer:"
|
||
)
|
||
]
|
||
|
||
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"query", "documents"})
|
||
llm = OpenAIChatGenerator(model="gpt-4o-mini", api_key=Secret.from_token("<your-api-key>"))
|
||
|
||
doc_search = Pipeline()
|
||
doc_search.add_component("text_embedder", text_embedder)
|
||
doc_search.add_component("retriever", retriever)
|
||
doc_search.add_component("prompt_builder", prompt_builder)
|
||
doc_search.add_component("llm", llm)
|
||
|
||
doc_search.connect("text_embedder.embedding", "retriever.query_embedding")
|
||
doc_search.connect("retriever.documents", "prompt_builder.documents")
|
||
doc_search.connect("prompt_builder.messages", "llm.messages")
|
||
|
||
query = "How can I deploy Mistral models with Cloudflare?"
|
||
|
||
result = doc_search.run(
|
||
{
|
||
"text_embedder": {"text": query},
|
||
"retriever": {"top_k": 1},
|
||
"prompt_builder": {"query": query}
|
||
}
|
||
)
|
||
|
||
print(result["llm"]["replies"])
|
||
```
|