mirror of
https://github.com/deepset-ai/haystack.git
synced 2026-01-07 04:27:15 +00:00
202 lines
9.1 KiB
Plaintext
202 lines
9.1 KiB
Plaintext
---
|
||
title: "HuggingFaceAPIChatGenerator"
|
||
id: huggingfaceapichatgenerator
|
||
slug: "/huggingfaceapichatgenerator"
|
||
description: "This generator enables chat completion using various Hugging Face APIs."
|
||
---
|
||
|
||
# HuggingFaceAPIChatGenerator
|
||
|
||
This generator enables chat completion using various Hugging Face APIs.
|
||
|
||
<div className="key-value-table">
|
||
|
||
| | |
|
||
| --- | --- |
|
||
| **Most common position in a pipeline** | After a [`ChatPromptBuilder`](../builders/chatpromptbuilder.mdx) |
|
||
| **Mandatory init variables** | `api_type`: The type of Hugging Face API to use <br /> <br />`api_params`: A dictionary with one of the following keys: <br /> <br />- `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.**OR** - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or `TEXT_EMBEDDINGS_INFERENCE`.`token`: The Hugging Face API token. Can be set with `HF_API_TOKEN` or `HF_TOKEN` env var. |
|
||
| **Mandatory run variables** | `messages`: A list of [`ChatMessage`](../../concepts/data-classes/chatmessage.mdx) objects representing the chat |
|
||
| **Output variables** | `replies`: A list of replies of the LLM to the input chat |
|
||
| **API reference** | [Generators](/reference/generators-api) |
|
||
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/chat/hugging_face_api.py |
|
||
|
||
</div>
|
||
|
||
## Overview
|
||
|
||
`HuggingFaceAPIChatGenerator` can be used to generate chat completions using different Hugging Face APIs:
|
||
|
||
- [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) - free tier available
|
||
- [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
|
||
- [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
|
||
|
||
This component's main input is a list of `ChatMessage` objects. `ChatMessage` is a data class that contains a message, a role (who generated the message, such as `user`, `assistant`, `system`, `function`), and optional metadata. For more information, check out our [`ChatMessage` docs](../../concepts/data-classes/chatmessage.mdx).
|
||
|
||
:::note
|
||
This component is designed for chat completion, so it expects a list of messages, not a single string. If you want to use Hugging Face APIs for simple text generation (such as translation or summarization tasks) or don't want to use the `ChatMessage` object, use [`HuggingFaceAPIGenerator`](huggingfaceapigenerator.mdx) instead.
|
||
|
||
:::
|
||
|
||
The component uses a `HF_API_TOKEN` environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with `token` – see code examples below.
|
||
The token is needed:
|
||
|
||
- If you use the Serverless Inference API, or
|
||
- If you use the Inference Endpoints.
|
||
|
||
### Streaming
|
||
|
||
This Generator supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) the tokens from the LLM directly in output. To do so, pass a function to the `streaming_callback` init parameter.
|
||
|
||
## Usage
|
||
|
||
### On its own
|
||
|
||
#### Using Serverless Inference API (Inference Providers) - Free Tier Available
|
||
|
||
This API allows you to quickly experiment with many models hosted on the Hugging Face Hub, offloading the inference to Hugging Face servers. It's rate-limited and not meant for production.
|
||
|
||
To use this API, you need a [free Hugging Face token](https://huggingface.co/settings/tokens).
|
||
The Generator expects the `model` in `api_params`. It's also recommended to specify a `provider` for better performance and reliability.
|
||
|
||
```python
|
||
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
||
from haystack.dataclasses import ChatMessage
|
||
from haystack.utils import Secret
|
||
from haystack.utils.hf import HFGenerationAPIType
|
||
|
||
messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
|
||
ChatMessage.from_user("What's Natural Language Processing?")]
|
||
|
||
## the api_type can be expressed using the HFGenerationAPIType enum or as a string
|
||
api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
|
||
api_type = "serverless_inference_api" # this is equivalent to the above
|
||
|
||
generator = HuggingFaceAPIChatGenerator(api_type=api_type,
|
||
api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
|
||
"provider": "together"},
|
||
token=Secret.from_env_var("HF_API_TOKEN"))
|
||
|
||
result = generator.run(messages)
|
||
print(result)
|
||
```
|
||
|
||
#### Using Paid Inference Endpoints
|
||
|
||
In this case, a private instance of the model is deployed by Hugging Face, and you typically pay per hour.
|
||
|
||
To understand how to spin up an Inference Endpoint, visit [Hugging Face documentation](https://huggingface.co/inference-endpoints/dedicated).
|
||
|
||
Additionally, in this case, you need to provide your Hugging Face token.
|
||
The Generator expects the `url` of your endpoint in `api_params`.
|
||
|
||
```python
|
||
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
||
from haystack.dataclasses import ChatMessage
|
||
from haystack.utils import Secret
|
||
|
||
messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
|
||
ChatMessage.from_user("What's Natural Language Processing?")]
|
||
|
||
generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
|
||
api_params={"url": "<your-inference-endpoint-url>"},
|
||
token=Secret.from_env_var("HF_API_TOKEN"))
|
||
|
||
result = generator.run(messages)
|
||
print(result)
|
||
```
|
||
|
||
#### Using Serverless Inference API (Inference Providers) with Text+Image Input
|
||
|
||
You can also use this component with multimodal models that support both text and image input:
|
||
|
||
```python
|
||
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
||
from haystack.dataclasses import ChatMessage, ImageContent
|
||
from haystack.utils import Secret
|
||
from haystack.utils.hf import HFGenerationAPIType
|
||
|
||
## Create an image from file path, URL, or base64
|
||
image = ImageContent.from_file_path("path/to/your/image.jpg")
|
||
|
||
## Create a multimodal message with both text and image
|
||
messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
|
||
|
||
generator = HuggingFaceAPIChatGenerator(
|
||
api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
|
||
api_params={
|
||
"model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model
|
||
"provider": "hyperbolic"
|
||
},
|
||
token=Secret.from_token("<your-api-key>")
|
||
)
|
||
|
||
result = generator.run(messages)
|
||
print(result)
|
||
```
|
||
|
||
#### Using Self-Hosted Text Generation Inference (TGI)
|
||
|
||
[Hugging Face Text Generation Inference](https://github.com/huggingface/text-generation-inference) is a toolkit for efficiently deploying and serving LLMs.
|
||
|
||
While it powers the most recent versions of Serverless Inference API and Inference Endpoints, it can be used easily on-premise through Docker.
|
||
|
||
For example, you can run a TGI container as follows:
|
||
|
||
```shell
|
||
model=HuggingFaceH4/zephyr-7b-beta
|
||
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
||
|
||
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model
|
||
```
|
||
|
||
For more information, refer to the [official TGI repository](https://github.com/huggingface/text-generation-inference).
|
||
|
||
The Generator expects the `url` of your TGI instance in `api_params`.
|
||
|
||
```python
|
||
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
||
from haystack.dataclasses import ChatMessage
|
||
|
||
messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
|
||
ChatMessage.from_user("What's Natural Language Processing?")]
|
||
|
||
generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
|
||
api_params={"url": "http://localhost:8080"})
|
||
|
||
result = generator.run(messages)
|
||
print(result)
|
||
```
|
||
|
||
### In a pipeline
|
||
|
||
```python
|
||
from haystack.components.builders import ChatPromptBuilder
|
||
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
||
from haystack.dataclasses import ChatMessage
|
||
from haystack import Pipeline
|
||
from haystack.utils import Secret
|
||
from haystack.utils.hf import HFGenerationAPIType
|
||
|
||
## no parameter init, we don't use any runtime template variables
|
||
prompt_builder = ChatPromptBuilder()
|
||
llm = HuggingFaceAPIChatGenerator(api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
|
||
api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
|
||
"provider": "together"},
|
||
token=Secret.from_env_var("HF_API_TOKEN"))
|
||
|
||
pipe = Pipeline()
|
||
pipe.add_component("prompt_builder", prompt_builder)
|
||
pipe.add_component("llm", llm)
|
||
pipe.connect("prompt_builder.prompt", "llm.messages")
|
||
location = "Berlin"
|
||
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
|
||
ChatMessage.from_user("Tell me about {{location}}")]
|
||
result = pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}})
|
||
|
||
print(result)
|
||
```
|
||
|
||
## Additional References
|
||
|
||
🧑🍳 Cookbook: [Build with Google Gemma: chat and RAG](https://haystack.deepset.ai/cookbook/gemma_chat_rag)
|