Haystack Bot a471fbfebe
Promote unstable docs for Haystack 2.21 (#10204)
Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com>
2025-12-08 20:09:00 +01:00

108 lines
4.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "VertexAIDocumentEmbedder"
id: vertexaidocumentembedder
slug: "/vertexaidocumentembedder"
description: "This component computes embeddings for documents using models through VertexAI Embeddings API."
---
# VertexAIDocumentEmbedder
This component computes embeddings for documents using models through VertexAI Embeddings API.
:::warning Deprecation Notice
This integration uses the deprecated google-generativeai SDK, which will lose support after August 2025.
We recommend switching to the new [GoogleGenAIDocumentEmbedder](googlegenaidocumentembedder.mdx) integration instead.
:::
<div className="key-value-table">
| | |
| --- | --- |
| **Most common position in a pipeline** | Before a [DocumentWriter](../writers/documentwriter.mdx) in an indexing pipeline |
| **Mandatory init variables** | `model`: The model used through the VertexAI Embeddings API |
| **Mandatory run variables** | `documents`: A list of documents to be embedded |
| **Output variables** | `documents`: A list of documents enriched with embeddings |
| **API reference** | [Google Vertex](/reference/integrations-google-vertex) |
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/google_vertex |
</div>
`VertexAIDocumentEmbedder` enriches the metadata of documents with an embedding of their content. To embed a string, use the [`VertexAITextEmbedder`](vertexaitextembedder.mdx).
To use the `VertexAIDocumentEmbedder`, initialize it with:
- `model`: The supported models are:
- "text-embedding-004"
- "text-embedding-005"
- "textembedding-gecko-multilingual@001"
- "text-multilingual-embedding-002"
- "text-embedding-large-exp-03-07"
- `task_type`: "RETRIEVAL_DOCUMENT” is the default. You can find all task types in the official [Google documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#tasktype).
### Authentication
`VertexAIDocumentEmbedder` uses Google Cloud Application Default Credentials (ADCs) for authentication. For more information on how to set up ADCs, see the [official documentation](https://cloud.google.com/docs/authentication/provide-credentials-adc).
Keep in mind that its essential to use an account that has access to a project authorized to use Google Vertex AI endpoints.
You can find your project ID in the [GCP resource manager](https://console.cloud.google.com/cloud-resource-manager) or locally by running `gcloud projects list` in your terminal. For more info on the gcloud CLI, see its [official documentation](https://cloud.google.com/cli).
## Usage
Install the `google-vertex-haystack` package to use this Embedder:
```shell
pip install google-vertex-haystack
```
### On its own
```python
from haystack import Document
from haystack_integrations.components.embedders.google_vertex import VertexAIDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = VertexAIDocumentEmbedder(model="text-embedding-005")
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
## [-0.044606007635593414, 0.02857724390923977, -0.03549133986234665,
```
### In a pipeline
```python
from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.google_vertex import VertexAITextEmbedder
from haystack_integrations.components.embedders.google_vertex import VertexAIDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities")]
document_embedder = VertexAIDocumentEmbedder(model="text-embedding-005")
documents_with_embeddings = document_embedder.run(documents)['documents']
document_store.write_documents(documents_with_embeddings)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", VertexAITextEmbedder(model="text-embedding-005"))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "Who lives in Berlin?"
result = query_pipeline.run({"text_embedder":{"text": query}})
print(result['retriever']['documents'][0])
## Document(id=..., content: 'My name is Wolfgang and I live in Berlin')
```