haystack/docs-website/versioned_docs/version-2.21/pipeline-components/embedders/fastembeddocumentembedder.mdx

---
title: "FastembedDocumentEmbedder"
id: fastembeddocumentembedder
slug: "/fastembeddocumentembedder"
description: "This component computes the embeddings of a list of documents using the models supported by FastEmbed."
---

# FastembedDocumentEmbedder

This component computes the embeddings of a list of documents using the models supported by FastEmbed.

<div className="key-value-table">

|  |  |
| --- | --- |
| **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx)  in an indexing pipeline                  |
| **Mandatory run variables**            | `documents`: A list of documents                                                            |
| **Output variables**                   | `documents`: A list of documents (enriched with embeddings)                                 |
| **API reference**                      | [FastEmbed](/reference/fastembed-embedders)                                                        |
| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/fastembed |

</div>

This component should be used to embed a list of documents. To embed a string, use the [`FastembedTextEmbedder`](fastembedtextembedder.mdx).

## Overview

`FastembedDocumentEmbedder` computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses embedding [models supported by FastEmbed](https://qdrant.github.io/fastembed/examples/Supported_Models/).

The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents in order to find the most similar or relevant documents.

### Compatible models

You can find the original models in the [FastEmbed documentation](https://qdrant.github.io/fastembed/).

Nowadays, most of the models in the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) are compatible with FastEmbed. You can look for compatibility in the [supported model list](https://qdrant.github.io/fastembed/examples/Supported_Models/).

### Installation

To start using this integration with Haystack, install the package with:

```shell
pip install fastembed-haystack
```

### Parameters

You can set the path where the model will be stored in a cache directory. Also, you can set the number of threads a single `onnxruntime` session can use.

```python
cache_dir= "/your_cacheDirectory"
embedder = FastembedDocumentEmbedder(
	*model="*BAAI/bge-large-en-v1.5",
	cache_dir=cache_dir,
	threads=2
)
```

If you want to use the data parallel encoding, you can set the parameters `parallel` and `batch_size`.

- If parallel > 1, data-parallel encoding will be used. This is recommended for offline encoding of large datasets.
- If parallel is 0, use all available cores.
- If None, don't use data-parallel processing; use default `onnxruntime` threading instead.

:::tip
If you create a Text Embedder and a Document Embedder based on the same model, Haystack uses the same resource behind the scenes to save resources.
:::

### Embedding Metadata

Text documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval.

You can do this easily by using the Document Embedder:

```python
from haystack.preview import Document
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder

doc = Document(text="some text",
	       metadata={"title": "relevant title",
			 "page number": 18})

embedder = FastembedDocumentEmbedder(
	model="BAAI/bge-small-en-v1.5",
        batch_size=256,
	metadata_fields_to_embed=["title"]
)

docs_w_embeddings = embedder.run(documents=[doc])["documents"]
```

## Usage

### On its own

```python
from haystack.dataclasses import Document
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder
document_list = [
	Document(content="I love pizza!"),
	Document(content="I like spaghetti")
]

doc_embedder = FastembedDocumentEmbedder()
doc_embedder.warm_up()

result = doc_embedder.run(document_list)
print(result['documents'][0].embedding)

## [-0.04235665127635002, 0.021791068837046623, ...]
```

### In a pipeline

```python
from haystack import Document, Pipeline
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder, FastembedTextEmbedder

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
    Document(content="fastembed is supported by and maintained by Qdrant."),
]

document_embedder = FastembedDocumentEmbedder()
writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE)

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("document_embedder", document_embedder)
indexing_pipeline.add_component("writer", writer)
indexing_pipeline.connect("document_embedder", "writer")

indexing_pipeline.run({"document_embedder": {"documents": documents}})

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", FastembedTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who supports fastembed?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])  # noqa: T201

## Document(id=...,
## content: 'fastembed is supported by and maintained by Qdrant.',
## score: 0.758..)
```

## Additional References

🧑‍🍳 Cookbook: [RAG Pipeline Using FastEmbed for Embeddings Generation](https://haystack.deepset.ai/cookbook/rag_fastembed)