mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-12-16 09:38:07 +00:00
1882 lines
77 KiB
Markdown
1882 lines
77 KiB
Markdown
---
|
|
title: "Generators"
|
|
id: generators-api
|
|
description: "Enables text generation using LLMs."
|
|
slug: "/generators-api"
|
|
---
|
|
|
|
<a id="azure"></a>
|
|
|
|
# Module azure
|
|
|
|
<a id="azure.AzureOpenAIGenerator"></a>
|
|
|
|
## AzureOpenAIGenerator
|
|
|
|
Generates text using OpenAI's large language models (LLMs).
|
|
|
|
It works with the gpt-4 - type models and supports streaming responses
|
|
from OpenAI API.
|
|
|
|
You can customize how the text is generated by passing parameters to the
|
|
OpenAI API. Use the `**generation_kwargs` argument when you initialize
|
|
the component or when you run it. Any parameter that works with
|
|
`openai.ChatCompletion.create` will work here too.
|
|
|
|
|
|
For details on OpenAI API parameters, see
|
|
[OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
|
|
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators import AzureOpenAIGenerator
|
|
from haystack.utils import Secret
|
|
client = AzureOpenAIGenerator(
|
|
azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
|
|
api_key=Secret.from_token("<your-api-key>"),
|
|
azure_deployment="<this a model name, e.g. gpt-4o-mini>")
|
|
response = client.run("What's Natural Language Processing? Be brief.")
|
|
print(response)
|
|
```
|
|
|
|
```
|
|
>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
|
|
>> the interaction between computers and human language. It involves enabling computers to understand, interpret,
|
|
>> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
|
|
>> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
|
|
>> 'completion_tokens': 49, 'total_tokens': 65}}]}
|
|
```
|
|
|
|
<a id="azure.AzureOpenAIGenerator.__init__"></a>
|
|
|
|
#### AzureOpenAIGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(azure_endpoint: Optional[str] = None,
|
|
api_version: Optional[str] = "2023-05-15",
|
|
azure_deployment: Optional[str] = "gpt-4o-mini",
|
|
api_key: Optional[Secret] = Secret.from_env_var(
|
|
"AZURE_OPENAI_API_KEY", strict=False),
|
|
azure_ad_token: Optional[Secret] = Secret.from_env_var(
|
|
"AZURE_OPENAI_AD_TOKEN", strict=False),
|
|
organization: Optional[str] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
system_prompt: Optional[str] = None,
|
|
timeout: Optional[float] = None,
|
|
max_retries: Optional[int] = None,
|
|
http_client_kwargs: Optional[dict[str, Any]] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
default_headers: Optional[dict[str, str]] = None,
|
|
*,
|
|
azure_ad_token_provider: Optional[AzureADTokenProvider] = None)
|
|
```
|
|
|
|
Initialize the Azure OpenAI Generator.
|
|
|
|
**Arguments**:
|
|
|
|
- `azure_endpoint`: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
|
|
- `api_version`: The version of the API to use. Defaults to 2023-05-15.
|
|
- `azure_deployment`: The deployment of the model, usually the model name.
|
|
- `api_key`: The API key to use for authentication.
|
|
- `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
|
|
- `organization`: Your organization ID, defaults to `None`. For help, see
|
|
[Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
|
|
- `streaming_callback`: A callback function called when a new token is received from the stream.
|
|
It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
|
|
as an argument.
|
|
- `system_prompt`: The system prompt to use for text generation. If not provided, the Generator
|
|
omits the system prompt and uses the default system prompt.
|
|
- `timeout`: Timeout for AzureOpenAI client. If not set, it is inferred from the
|
|
`OPENAI_TIMEOUT` environment variable or set to 30.
|
|
- `max_retries`: Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
|
|
If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
|
|
- `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
|
|
For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
|
|
- `generation_kwargs`: Other parameters to use for the model, sent directly to
|
|
the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
|
|
more details.
|
|
Some of the supported parameters:
|
|
- `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
|
|
including visible output tokens and reasoning tokens.
|
|
- `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
|
|
Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
|
|
- `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
|
|
considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
|
|
comprising the top 10% probability mass are considered.
|
|
- `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
|
|
the LLM will generate two completions per prompt, resulting in 6 completions total.
|
|
- `stop`: One or more sequences after which the LLM should stop generating tokens.
|
|
- `presence_penalty`: The penalty applied if a token is already present.
|
|
Higher values make the model less likely to repeat the token.
|
|
- `frequency_penalty`: Penalty applied if a token has already been generated.
|
|
Higher values make the model less likely to repeat the token.
|
|
- `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
|
|
values are the bias to add to that token.
|
|
- `default_headers`: Default headers to use for the AzureOpenAI client.
|
|
- `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
|
|
every request.
|
|
|
|
<a id="azure.AzureOpenAIGenerator.to_dict"></a>
|
|
|
|
#### AzureOpenAIGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
The serialized component as a dictionary.
|
|
|
|
<a id="azure.AzureOpenAIGenerator.from_dict"></a>
|
|
|
|
#### AzureOpenAIGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary representation of this component.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component instance.
|
|
|
|
<a id="azure.AzureOpenAIGenerator.run"></a>
|
|
|
|
#### AzureOpenAIGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[str], meta=list[dict[str, Any]])
|
|
def run(prompt: str,
|
|
system_prompt: Optional[str] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Invoke the text generation inference based on the provided messages and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `prompt`: The string prompt to use for text generation.
|
|
- `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
|
|
prompt, if defined at initialisation time, is used.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
|
|
passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
|
|
the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
|
|
|
|
**Returns**:
|
|
|
|
A list of strings containing the generated responses and a list of dictionaries containing the metadata
|
|
for each response.
|
|
|
|
<a id="hugging_face_local"></a>
|
|
|
|
# Module hugging\_face\_local
|
|
|
|
<a id="hugging_face_local.HuggingFaceLocalGenerator"></a>
|
|
|
|
## HuggingFaceLocalGenerator
|
|
|
|
Generates text using models from Hugging Face that run locally.
|
|
|
|
LLMs running locally may need powerful hardware.
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators import HuggingFaceLocalGenerator
|
|
|
|
generator = HuggingFaceLocalGenerator(
|
|
model="google/flan-t5-large",
|
|
task="text2text-generation",
|
|
generation_kwargs={"max_new_tokens": 100, "temperature": 0.9})
|
|
|
|
generator.warm_up()
|
|
|
|
print(generator.run("Who is the best American actor?"))
|
|
# {'replies': ['John Cusack']}
|
|
```
|
|
|
|
<a id="hugging_face_local.HuggingFaceLocalGenerator.__init__"></a>
|
|
|
|
#### HuggingFaceLocalGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(model: str = "google/flan-t5-base",
|
|
task: Optional[Literal["text-generation",
|
|
"text2text-generation"]] = None,
|
|
device: Optional[ComponentDevice] = None,
|
|
token: Optional[Secret] = Secret.from_env_var(
|
|
["HF_API_TOKEN", "HF_TOKEN"], strict=False),
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
|
|
stop_words: Optional[list[str]] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None)
|
|
```
|
|
|
|
Creates an instance of a HuggingFaceLocalGenerator.
|
|
|
|
**Arguments**:
|
|
|
|
- `model`: The Hugging Face text generation model name or path.
|
|
- `task`: The task for the Hugging Face pipeline. Possible options:
|
|
- `text-generation`: Supported by decoder models, like GPT.
|
|
- `text2text-generation`: Supported by encoder-decoder models, like T5.
|
|
If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
|
|
If not specified, the component calls the Hugging Face API to infer the task from the model name.
|
|
- `device`: The device for loading the model. If `None`, automatically selects the default device.
|
|
If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
|
|
- `token`: The token to use as HTTP bearer authorization for remote files.
|
|
If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
|
|
- `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
|
|
Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
|
|
See Hugging Face's documentation for more information:
|
|
- [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
|
|
- [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
|
|
- `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
|
|
Hugging Face pipeline for text generation.
|
|
These keyword arguments provide fine-grained control over the Hugging Face pipeline.
|
|
In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
|
|
For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
|
|
In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
|
|
[transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
|
|
- `stop_words`: If the model generates a stop word, the generation stops.
|
|
If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
|
|
For some chat models, the output includes both the new text and the original prompt.
|
|
In these cases, make sure your prompt has no stop words.
|
|
- `streaming_callback`: An optional callable for handling streaming responses.
|
|
|
|
<a id="hugging_face_local.HuggingFaceLocalGenerator.warm_up"></a>
|
|
|
|
#### HuggingFaceLocalGenerator.warm\_up
|
|
|
|
```python
|
|
def warm_up()
|
|
```
|
|
|
|
Initializes the component.
|
|
|
|
<a id="hugging_face_local.HuggingFaceLocalGenerator.to_dict"></a>
|
|
|
|
#### HuggingFaceLocalGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serializes the component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
Dictionary with serialized data.
|
|
|
|
<a id="hugging_face_local.HuggingFaceLocalGenerator.from_dict"></a>
|
|
|
|
#### HuggingFaceLocalGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator"
|
|
```
|
|
|
|
Deserializes the component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary to deserialize from.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component.
|
|
|
|
<a id="hugging_face_local.HuggingFaceLocalGenerator.run"></a>
|
|
|
|
#### HuggingFaceLocalGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[str])
|
|
def run(prompt: str,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Run the text generation model on the given prompt.
|
|
|
|
**Arguments**:
|
|
|
|
- `prompt`: A string representing the prompt.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary containing the generated replies.
|
|
- replies: A list of strings representing the generated replies.
|
|
|
|
<a id="hugging_face_api"></a>
|
|
|
|
# Module hugging\_face\_api
|
|
|
|
<a id="hugging_face_api.HuggingFaceAPIGenerator"></a>
|
|
|
|
## HuggingFaceAPIGenerator
|
|
|
|
Generates text using Hugging Face APIs.
|
|
|
|
Use it with the following Hugging Face APIs:
|
|
- [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
|
|
- [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
|
|
|
|
**Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
|
|
`text_generation` endpoint. Generative models are now only available through providers supporting the
|
|
`chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
|
|
Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
|
|
|
|
### Usage examples
|
|
|
|
#### With Hugging Face Inference Endpoints
|
|
|
|
|
|
#### With self-hosted text generation inference
|
|
|
|
#### With the free serverless inference API
|
|
|
|
Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
|
|
`text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
|
|
`chat_completion` endpoint.
|
|
|
|
```python
|
|
from haystack.components.generators import HuggingFaceAPIGenerator
|
|
from haystack.utils import Secret
|
|
|
|
generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
|
|
api_params={"url": "<your-inference-endpoint-url>"},
|
|
token=Secret.from_token("<your-api-key>"))
|
|
|
|
result = generator.run(prompt="What's Natural Language Processing?")
|
|
print(result)
|
|
```
|
|
```python
|
|
from haystack.components.generators import HuggingFaceAPIGenerator
|
|
|
|
generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
|
|
api_params={"url": "http://localhost:8080"})
|
|
|
|
result = generator.run(prompt="What's Natural Language Processing?")
|
|
print(result)
|
|
```
|
|
```python
|
|
from haystack.components.generators import HuggingFaceAPIGenerator
|
|
from haystack.utils import Secret
|
|
|
|
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
|
|
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
|
|
token=Secret.from_token("<your-api-key>"))
|
|
|
|
result = generator.run(prompt="What's Natural Language Processing?")
|
|
print(result)
|
|
```
|
|
|
|
<a id="hugging_face_api.HuggingFaceAPIGenerator.__init__"></a>
|
|
|
|
#### HuggingFaceAPIGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(api_type: Union[HFGenerationAPIType, str],
|
|
api_params: dict[str, str],
|
|
token: Optional[Secret] = Secret.from_env_var(
|
|
["HF_API_TOKEN", "HF_TOKEN"], strict=False),
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
stop_words: Optional[list[str]] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None)
|
|
```
|
|
|
|
Initialize the HuggingFaceAPIGenerator instance.
|
|
|
|
**Arguments**:
|
|
|
|
- `api_type`: The type of Hugging Face API to use. Available types:
|
|
- `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
|
|
- `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
|
|
- `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
|
|
This might no longer work due to changes in the models offered in the Hugging Face Inference API.
|
|
Please use the `HuggingFaceAPIChatGenerator` component instead.
|
|
- `api_params`: A dictionary with the following keys:
|
|
- `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
|
|
- `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
|
|
`TEXT_GENERATION_INFERENCE`.
|
|
- Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
|
|
- `token`: The Hugging Face token to use as HTTP bearer authorization.
|
|
Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
|
|
- `generation_kwargs`: A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
|
|
`temperature`, `top_k`, `top_p`.
|
|
For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
|
|
for more information.
|
|
- `stop_words`: An optional list of strings representing the stop words.
|
|
- `streaming_callback`: An optional callable for handling streaming responses.
|
|
|
|
<a id="hugging_face_api.HuggingFaceAPIGenerator.to_dict"></a>
|
|
|
|
#### HuggingFaceAPIGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary containing the serialized component.
|
|
|
|
<a id="hugging_face_api.HuggingFaceAPIGenerator.from_dict"></a>
|
|
|
|
#### HuggingFaceAPIGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
<a id="hugging_face_api.HuggingFaceAPIGenerator.run"></a>
|
|
|
|
#### HuggingFaceAPIGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[str], meta=list[dict[str, Any]])
|
|
def run(prompt: str,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Invoke the text generation inference for the given prompt and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `prompt`: A string representing the prompt.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the generated replies and metadata. Both are lists of length n.
|
|
- replies: A list of strings representing the generated replies.
|
|
|
|
<a id="openai"></a>
|
|
|
|
# Module openai
|
|
|
|
<a id="openai.OpenAIGenerator"></a>
|
|
|
|
## OpenAIGenerator
|
|
|
|
Generates text using OpenAI's large language models (LLMs).
|
|
|
|
It works with the gpt-4 and o-series models and supports streaming responses
|
|
from OpenAI API. It uses strings as input and output.
|
|
|
|
You can customize how the text is generated by passing parameters to the
|
|
OpenAI API. Use the `**generation_kwargs` argument when you initialize
|
|
the component or when you run it. Any parameter that works with
|
|
`openai.ChatCompletion.create` will work here too.
|
|
|
|
|
|
For details on OpenAI API parameters, see
|
|
[OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators import OpenAIGenerator
|
|
client = OpenAIGenerator()
|
|
response = client.run("What's Natural Language Processing? Be brief.")
|
|
print(response)
|
|
|
|
>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
|
|
>> the interaction between computers and human language. It involves enabling computers to understand, interpret,
|
|
>> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
|
|
>> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
|
|
>> 'completion_tokens': 49, 'total_tokens': 65}}]}
|
|
```
|
|
|
|
<a id="openai.OpenAIGenerator.__init__"></a>
|
|
|
|
#### OpenAIGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
|
|
model: str = "gpt-4o-mini",
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
api_base_url: Optional[str] = None,
|
|
organization: Optional[str] = None,
|
|
system_prompt: Optional[str] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
timeout: Optional[float] = None,
|
|
max_retries: Optional[int] = None,
|
|
http_client_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini
|
|
|
|
By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
|
|
in the OpenAI client.
|
|
|
|
**Arguments**:
|
|
|
|
- `api_key`: The OpenAI API key to connect to OpenAI.
|
|
- `model`: The name of the model to use.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
The callback function accepts StreamingChunk as an argument.
|
|
- `api_base_url`: An optional base URL.
|
|
- `organization`: The Organization ID, defaults to `None`.
|
|
- `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is
|
|
omitted, and the default system prompt of the model is used.
|
|
- `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to
|
|
the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
|
|
more details.
|
|
Some of the supported parameters:
|
|
- `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
|
|
including visible output tokens and reasoning tokens.
|
|
- `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
|
|
Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
|
|
- `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
|
|
considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
|
|
comprising the top 10% probability mass are considered.
|
|
- `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
|
|
it will generate two completions for each of the three prompts, ending up with 6 completions in total.
|
|
- `stop`: One or more sequences after which the LLM should stop generating tokens.
|
|
- `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
|
|
the model will be less likely to repeat the same token in the text.
|
|
- `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
|
|
Bigger values mean the model will be less likely to repeat the same token in the text.
|
|
- `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
|
|
values are the bias to add to that token.
|
|
- `timeout`: Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
|
|
or set to 30.
|
|
- `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
|
|
from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
|
|
- `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
|
|
For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
|
|
|
|
<a id="openai.OpenAIGenerator.to_dict"></a>
|
|
|
|
#### OpenAIGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
The serialized component as a dictionary.
|
|
|
|
<a id="openai.OpenAIGenerator.from_dict"></a>
|
|
|
|
#### OpenAIGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary representation of this component.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component instance.
|
|
|
|
<a id="openai.OpenAIGenerator.run"></a>
|
|
|
|
#### OpenAIGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[str], meta=list[dict[str, Any]])
|
|
def run(prompt: str,
|
|
system_prompt: Optional[str] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Invoke the text generation inference based on the provided messages and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `prompt`: The string prompt to use for text generation.
|
|
- `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
|
|
prompt, if defined at initialisation time, is used.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
|
|
passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
|
|
the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
|
|
|
|
**Returns**:
|
|
|
|
A list of strings containing the generated responses and a list of dictionaries containing the metadata
|
|
for each response.
|
|
|
|
<a id="openai_dalle"></a>
|
|
|
|
# Module openai\_dalle
|
|
|
|
<a id="openai_dalle.DALLEImageGenerator"></a>
|
|
|
|
## DALLEImageGenerator
|
|
|
|
Generates images using OpenAI's DALL-E model.
|
|
|
|
For details on OpenAI API parameters, see
|
|
[OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators import DALLEImageGenerator
|
|
image_generator = DALLEImageGenerator()
|
|
response = image_generator.run("Show me a picture of a black cat.")
|
|
print(response)
|
|
```
|
|
|
|
<a id="openai_dalle.DALLEImageGenerator.__init__"></a>
|
|
|
|
#### DALLEImageGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(model: str = "dall-e-3",
|
|
quality: Literal["standard", "hd"] = "standard",
|
|
size: Literal["256x256", "512x512", "1024x1024", "1792x1024",
|
|
"1024x1792"] = "1024x1024",
|
|
response_format: Literal["url", "b64_json"] = "url",
|
|
api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
|
|
api_base_url: Optional[str] = None,
|
|
organization: Optional[str] = None,
|
|
timeout: Optional[float] = None,
|
|
max_retries: Optional[int] = None,
|
|
http_client_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
|
|
|
|
**Arguments**:
|
|
|
|
- `model`: The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
|
|
- `quality`: The quality of the generated image. Can be "standard" or "hd".
|
|
- `size`: The size of the generated images.
|
|
Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
|
|
Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
|
|
- `response_format`: The format of the response. Can be "url" or "b64_json".
|
|
- `api_key`: The OpenAI API key to connect to OpenAI.
|
|
- `api_base_url`: An optional base URL.
|
|
- `organization`: The Organization ID, defaults to `None`.
|
|
- `timeout`: Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
|
|
or set to 30.
|
|
- `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
|
|
from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
|
|
- `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
|
|
For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
|
|
|
|
<a id="openai_dalle.DALLEImageGenerator.warm_up"></a>
|
|
|
|
#### DALLEImageGenerator.warm\_up
|
|
|
|
```python
|
|
def warm_up() -> None
|
|
```
|
|
|
|
Warm up the OpenAI client.
|
|
|
|
<a id="openai_dalle.DALLEImageGenerator.run"></a>
|
|
|
|
#### DALLEImageGenerator.run
|
|
|
|
```python
|
|
@component.output_types(images=list[str], revised_prompt=str)
|
|
def run(prompt: str,
|
|
size: Optional[Literal["256x256", "512x512", "1024x1024", "1792x1024",
|
|
"1024x1792"]] = None,
|
|
quality: Optional[Literal["standard", "hd"]] = None,
|
|
response_format: Optional[Optional[Literal["url",
|
|
"b64_json"]]] = None)
|
|
```
|
|
|
|
Invokes the image generation inference based on the provided prompt and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `prompt`: The prompt to generate the image.
|
|
- `size`: If provided, overrides the size provided during initialization.
|
|
- `quality`: If provided, overrides the quality provided during initialization.
|
|
- `response_format`: If provided, overrides the response format provided during initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary containing the generated list of images and the revised prompt.
|
|
Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
|
|
The revised prompt is the prompt that was used to generate the image, if there was any revision
|
|
to the prompt made by OpenAI.
|
|
|
|
<a id="openai_dalle.DALLEImageGenerator.to_dict"></a>
|
|
|
|
#### DALLEImageGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
The serialized component as a dictionary.
|
|
|
|
<a id="openai_dalle.DALLEImageGenerator.from_dict"></a>
|
|
|
|
#### DALLEImageGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary representation of this component.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component instance.
|
|
|
|
<a id="chat/azure"></a>
|
|
|
|
# Module chat/azure
|
|
|
|
<a id="chat/azure.AzureOpenAIChatGenerator"></a>
|
|
|
|
## AzureOpenAIChatGenerator
|
|
|
|
Generates text using OpenAI's models on Azure.
|
|
|
|
It works with the gpt-4 - type models and supports streaming responses
|
|
from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
|
|
format in input and output.
|
|
|
|
You can customize how the text is generated by passing parameters to the
|
|
OpenAI API. Use the `**generation_kwargs` argument when you initialize
|
|
the component or when you run it. Any parameter that works with
|
|
`openai.ChatCompletion.create` will work here too.
|
|
|
|
For details on OpenAI API parameters, see
|
|
[OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators.chat import AzureOpenAIChatGenerator
|
|
from haystack.dataclasses import ChatMessage
|
|
from haystack.utils import Secret
|
|
|
|
messages = [ChatMessage.from_user("What's Natural Language Processing?")]
|
|
|
|
client = AzureOpenAIChatGenerator(
|
|
azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
|
|
api_key=Secret.from_token("<your-api-key>"),
|
|
azure_deployment="<this a model name, e.g. gpt-4o-mini>")
|
|
response = client.run(messages)
|
|
print(response)
|
|
```
|
|
|
|
```
|
|
{'replies':
|
|
[ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
|
|
"Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
|
|
enabling computers to understand, interpret, and generate human language in a way that is useful.")],
|
|
_name=None,
|
|
_meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
|
|
'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
|
|
}
|
|
```
|
|
|
|
<a id="chat/azure.AzureOpenAIChatGenerator.__init__"></a>
|
|
|
|
#### AzureOpenAIChatGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(azure_endpoint: Optional[str] = None,
|
|
api_version: Optional[str] = "2023-05-15",
|
|
azure_deployment: Optional[str] = "gpt-4o-mini",
|
|
api_key: Optional[Secret] = Secret.from_env_var(
|
|
"AZURE_OPENAI_API_KEY", strict=False),
|
|
azure_ad_token: Optional[Secret] = Secret.from_env_var(
|
|
"AZURE_OPENAI_AD_TOKEN", strict=False),
|
|
organization: Optional[str] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
timeout: Optional[float] = None,
|
|
max_retries: Optional[int] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
default_headers: Optional[dict[str, str]] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
tools_strict: bool = False,
|
|
*,
|
|
azure_ad_token_provider: Optional[Union[
|
|
AzureADTokenProvider, AsyncAzureADTokenProvider]] = None,
|
|
http_client_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Initialize the Azure OpenAI Chat Generator component.
|
|
|
|
**Arguments**:
|
|
|
|
- `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
|
|
- `api_version`: The version of the API to use. Defaults to 2023-05-15.
|
|
- `azure_deployment`: The deployment of the model, usually the model name.
|
|
- `api_key`: The API key to use for authentication.
|
|
- `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
|
|
- `organization`: Your organization ID, defaults to `None`. For help, see
|
|
[Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
|
|
- `streaming_callback`: A callback function called when a new token is received from the stream.
|
|
It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
|
|
as an argument.
|
|
- `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
|
|
`OPENAI_TIMEOUT` environment variable, or 30 seconds.
|
|
- `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
|
|
If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
|
|
- `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
|
|
the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
|
|
Some of the supported parameters:
|
|
- `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
|
|
including visible output tokens and reasoning tokens.
|
|
- `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
|
|
Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
|
|
- `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
|
|
tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
|
|
the top 10% probability mass are considered.
|
|
- `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
|
|
the LLM will generate two completions per prompt, resulting in 6 completions total.
|
|
- `stop`: One or more sequences after which the LLM should stop generating tokens.
|
|
- `presence_penalty`: The penalty applied if a token is already present.
|
|
Higher values make the model less likely to repeat the token.
|
|
- `frequency_penalty`: Penalty applied if a token has already been generated.
|
|
Higher values make the model less likely to repeat the token.
|
|
- `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
|
|
values are the bias to add to that token.
|
|
- `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
|
|
If provided, the output will always be validated against this
|
|
format (unless the model returns a tool call).
|
|
For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
|
|
Notes:
|
|
- This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
|
|
Older models only support basic version of structured outputs through `{"type": "json_object"}`.
|
|
For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
|
|
- For structured outputs with streaming,
|
|
the `response_format` must be a JSON schema and not a Pydantic model.
|
|
- `default_headers`: Default headers to use for the AzureOpenAI client.
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
- `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
|
|
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
|
|
- `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
|
|
every request.
|
|
- `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
|
|
For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
|
|
|
|
<a id="chat/azure.AzureOpenAIChatGenerator.to_dict"></a>
|
|
|
|
#### AzureOpenAIChatGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
The serialized component as a dictionary.
|
|
|
|
<a id="chat/azure.AzureOpenAIChatGenerator.from_dict"></a>
|
|
|
|
#### AzureOpenAIChatGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary representation of this component.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component instance.
|
|
|
|
<a id="chat/azure.AzureOpenAIChatGenerator.run"></a>
|
|
|
|
#### AzureOpenAIChatGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
def run(messages: list[ChatMessage],
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
*,
|
|
tools: Optional[ToolsType] = None,
|
|
tools_strict: Optional[bool] = None)
|
|
```
|
|
|
|
Invokes chat completion based on the provided messages and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage instances representing the input messages.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
|
|
override the parameters passed during component initialization.
|
|
For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
If set, it will override the `tools` parameter provided during initialization.
|
|
- `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
|
|
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
|
|
If set, it will override the `tools_strict` parameter set during component initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following key:
|
|
- `replies`: A list containing the generated responses as ChatMessage instances.
|
|
|
|
<a id="chat/azure.AzureOpenAIChatGenerator.run_async"></a>
|
|
|
|
#### AzureOpenAIChatGenerator.run\_async
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
async def run_async(messages: list[ChatMessage],
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
*,
|
|
tools: Optional[ToolsType] = None,
|
|
tools_strict: Optional[bool] = None)
|
|
```
|
|
|
|
Asynchronously invokes chat completion based on the provided messages and generation parameters.
|
|
|
|
This is the asynchronous version of the `run` method. It has the same parameters and return values
|
|
but can be used with `await` in async code.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage instances representing the input messages.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
Must be a coroutine.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
|
|
override the parameters passed during component initialization.
|
|
For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
If set, it will override the `tools` parameter provided during initialization.
|
|
- `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
|
|
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
|
|
If set, it will override the `tools_strict` parameter set during component initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following key:
|
|
- `replies`: A list containing the generated responses as ChatMessage instances.
|
|
|
|
<a id="chat/hugging_face_local"></a>
|
|
|
|
# Module chat/hugging\_face\_local
|
|
|
|
<a id="chat/hugging_face_local.default_tool_parser"></a>
|
|
|
|
#### default\_tool\_parser
|
|
|
|
```python
|
|
def default_tool_parser(text: str) -> Optional[list[ToolCall]]
|
|
```
|
|
|
|
Default implementation for parsing tool calls from model output text.
|
|
|
|
Uses DEFAULT_TOOL_PATTERN to extract tool calls.
|
|
|
|
**Arguments**:
|
|
|
|
- `text`: The text to parse for tool calls.
|
|
|
|
**Returns**:
|
|
|
|
A list containing a single ToolCall if a valid tool call is found, None otherwise.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator"></a>
|
|
|
|
## HuggingFaceLocalChatGenerator
|
|
|
|
Generates chat responses using models from Hugging Face that run locally.
|
|
|
|
Use this component with chat-based models,
|
|
such as `HuggingFaceH4/zephyr-7b-beta` or `meta-llama/Llama-2-7b-chat-hf`.
|
|
LLMs running locally may need powerful hardware.
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
|
|
from haystack.dataclasses import ChatMessage
|
|
|
|
generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta")
|
|
generator.warm_up()
|
|
messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
|
|
print(generator.run(messages))
|
|
```
|
|
|
|
```
|
|
{'replies':
|
|
[ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
|
|
"Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
|
|
with the interaction between computers and human language. It enables computers to understand, interpret, and
|
|
generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
|
|
analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
|
|
process and derive meaning from human language, improving communication between humans and machines.")],
|
|
_name=None,
|
|
_meta={'finish_reason': 'stop', 'index': 0, 'model':
|
|
'mistralai/Mistral-7B-Instruct-v0.2',
|
|
'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
|
|
]
|
|
}
|
|
```
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__init__"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta",
|
|
task: Optional[Literal["text-generation",
|
|
"text2text-generation"]] = None,
|
|
device: Optional[ComponentDevice] = None,
|
|
token: Optional[Secret] = Secret.from_env_var(
|
|
["HF_API_TOKEN", "HF_TOKEN"], strict=False),
|
|
chat_template: Optional[str] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
|
|
stop_words: Optional[list[str]] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
tool_parsing_function: Optional[Callable[
|
|
[str], Optional[list[ToolCall]]]] = None,
|
|
async_executor: Optional[ThreadPoolExecutor] = None) -> None
|
|
```
|
|
|
|
Initializes the HuggingFaceLocalChatGenerator component.
|
|
|
|
**Arguments**:
|
|
|
|
- `model`: The Hugging Face text generation model name or path,
|
|
for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
|
|
The model must be a chat model supporting the ChatML messaging
|
|
format.
|
|
If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
|
|
- `task`: The task for the Hugging Face pipeline. Possible options:
|
|
- `text-generation`: Supported by decoder models, like GPT.
|
|
- `text2text-generation`: Supported by encoder-decoder models, like T5.
|
|
If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
|
|
If not specified, the component calls the Hugging Face API to infer the task from the model name.
|
|
- `device`: The device for loading the model. If `None`, automatically selects the default device.
|
|
If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
|
|
- `token`: The token to use as HTTP bearer authorization for remote files.
|
|
If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
|
|
- `chat_template`: Specifies an optional Jinja template for formatting chat
|
|
messages. Most high-quality chat models have their own templates, but for models without this
|
|
feature or if you prefer a custom template, use this parameter.
|
|
- `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
|
|
Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
|
|
See Hugging Face's documentation for more information:
|
|
- - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
|
|
- - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
|
|
The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
|
|
- `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
|
|
Hugging Face pipeline for text generation.
|
|
These keyword arguments provide fine-grained control over the Hugging Face pipeline.
|
|
In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
|
|
For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
|
|
In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
|
|
- `stop_words`: A list of stop words. If the model generates a stop word, the generation stops.
|
|
If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
|
|
For some chat models, the output includes both the new text and the original prompt.
|
|
In these cases, make sure your prompt has no stop words.
|
|
- `streaming_callback`: An optional callable for handling streaming responses.
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
- `tool_parsing_function`: A callable that takes a string and returns a list of ToolCall objects or None.
|
|
If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
|
|
- `async_executor`: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
|
|
initialized and used
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__del__"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.\_\_del\_\_
|
|
|
|
```python
|
|
def __del__() -> None
|
|
```
|
|
|
|
Cleanup when the instance is being destroyed.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.shutdown"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.shutdown
|
|
|
|
```python
|
|
def shutdown() -> None
|
|
```
|
|
|
|
Explicitly shutdown the executor if we own it.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.warm_up"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.warm\_up
|
|
|
|
```python
|
|
def warm_up() -> None
|
|
```
|
|
|
|
Initializes the component.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.to_dict"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serializes the component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
Dictionary with serialized data.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.from_dict"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator"
|
|
```
|
|
|
|
Deserializes the component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary to deserialize from.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
def run(messages: list[ChatMessage],
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]]
|
|
```
|
|
|
|
Invoke text generation inference based on the provided messages and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage objects representing the input messages.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation.
|
|
- `streaming_callback`: An optional callable for handling streaming responses.
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
If set, it will override the `tools` parameter provided during initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following keys:
|
|
- `replies`: A list containing the generated responses as ChatMessage instances.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.create_message"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.create\_message
|
|
|
|
```python
|
|
def create_message(text: str,
|
|
index: int,
|
|
tokenizer: Union["PreTrainedTokenizer",
|
|
"PreTrainedTokenizerFast"],
|
|
prompt: str,
|
|
generation_kwargs: dict[str, Any],
|
|
parse_tool_calls: bool = False) -> ChatMessage
|
|
```
|
|
|
|
Create a ChatMessage instance from the provided text, populated with metadata.
|
|
|
|
**Arguments**:
|
|
|
|
- `text`: The generated text.
|
|
- `index`: The index of the generated text.
|
|
- `tokenizer`: The tokenizer used for generation.
|
|
- `prompt`: The prompt used for generation.
|
|
- `generation_kwargs`: The generation parameters.
|
|
- `parse_tool_calls`: Whether to attempt parsing tool calls from the text.
|
|
|
|
**Returns**:
|
|
|
|
A ChatMessage instance.
|
|
|
|
<a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run_async"></a>
|
|
|
|
#### HuggingFaceLocalChatGenerator.run\_async
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
async def run_async(
|
|
messages: list[ChatMessage],
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]]
|
|
```
|
|
|
|
Asynchronously invokes text generation inference based on the provided messages and generation parameters.
|
|
|
|
This is the asynchronous version of the `run` method. It has the same parameters
|
|
and return values but can be used with `await` in an async code.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage objects representing the input messages.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation.
|
|
- `streaming_callback`: An optional callable for handling streaming responses.
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
If set, it will override the `tools` parameter provided during initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following keys:
|
|
- `replies`: A list containing the generated responses as ChatMessage instances.
|
|
|
|
<a id="chat/hugging_face_api"></a>
|
|
|
|
# Module chat/hugging\_face\_api
|
|
|
|
<a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator"></a>
|
|
|
|
## HuggingFaceAPIChatGenerator
|
|
|
|
Completes chats using Hugging Face APIs.
|
|
|
|
HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
|
|
format for input and output. Use it to generate text with Hugging Face APIs:
|
|
- [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
|
|
- [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
|
|
- [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
|
|
|
|
### Usage examples
|
|
|
|
#### With the serverless inference API (Inference Providers) - free tier available
|
|
|
|
```python
|
|
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
|
from haystack.dataclasses import ChatMessage
|
|
from haystack.utils import Secret
|
|
from haystack.utils.hf import HFGenerationAPIType
|
|
|
|
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
|
|
ChatMessage.from_user("What's Natural Language Processing?")]
|
|
|
|
# the api_type can be expressed using the HFGenerationAPIType enum or as a string
|
|
api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
|
|
api_type = "serverless_inference_api" # this is equivalent to the above
|
|
|
|
generator = HuggingFaceAPIChatGenerator(api_type=api_type,
|
|
api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
|
|
"provider": "together"},
|
|
token=Secret.from_token("<your-api-key>"))
|
|
|
|
result = generator.run(messages)
|
|
print(result)
|
|
```
|
|
|
|
#### With the serverless inference API (Inference Providers) and text+image input
|
|
|
|
```python
|
|
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
|
from haystack.dataclasses import ChatMessage, ImageContent
|
|
from haystack.utils import Secret
|
|
from haystack.utils.hf import HFGenerationAPIType
|
|
|
|
# Create an image from file path, URL, or base64
|
|
image = ImageContent.from_file_path("path/to/your/image.jpg")
|
|
|
|
# Create a multimodal message with both text and image
|
|
messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
|
|
|
|
generator = HuggingFaceAPIChatGenerator(
|
|
api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
|
|
api_params={
|
|
"model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model
|
|
"provider": "hyperbolic"
|
|
},
|
|
token=Secret.from_token("<your-api-key>")
|
|
)
|
|
|
|
result = generator.run(messages)
|
|
print(result)
|
|
```
|
|
|
|
#### With paid inference endpoints
|
|
|
|
```python
|
|
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
|
from haystack.dataclasses import ChatMessage
|
|
from haystack.utils import Secret
|
|
|
|
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
|
|
ChatMessage.from_user("What's Natural Language Processing?")]
|
|
|
|
generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
|
|
api_params={"url": "<your-inference-endpoint-url>"},
|
|
token=Secret.from_token("<your-api-key>"))
|
|
|
|
result = generator.run(messages)
|
|
print(result)
|
|
|
|
#### With self-hosted text generation inference
|
|
|
|
```python
|
|
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
|
|
from haystack.dataclasses import ChatMessage
|
|
|
|
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
|
|
ChatMessage.from_user("What's Natural Language Processing?")]
|
|
|
|
generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
|
|
api_params={"url": "http://localhost:8080"})
|
|
|
|
result = generator.run(messages)
|
|
print(result)
|
|
```
|
|
|
|
<a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.__init__"></a>
|
|
|
|
#### HuggingFaceAPIChatGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(api_type: Union[HFGenerationAPIType, str],
|
|
api_params: dict[str, str],
|
|
token: Optional[Secret] = Secret.from_env_var(
|
|
["HF_API_TOKEN", "HF_TOKEN"], strict=False),
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
stop_words: Optional[list[str]] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
tools: Optional[ToolsType] = None)
|
|
```
|
|
|
|
Initialize the HuggingFaceAPIChatGenerator instance.
|
|
|
|
**Arguments**:
|
|
|
|
- `api_type`: The type of Hugging Face API to use. Available types:
|
|
- `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
|
|
- `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
|
|
- `serverless_inference_api`: See
|
|
[Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
|
|
- `api_params`: A dictionary with the following keys:
|
|
- `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
|
|
- `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
|
|
- `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
|
|
`TEXT_GENERATION_INFERENCE`.
|
|
- Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
|
|
- `token`: The Hugging Face token to use as HTTP bearer authorization.
|
|
Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
|
|
- `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
|
|
Some examples: `max_tokens`, `temperature`, `top_p`.
|
|
For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
|
|
- `stop_words`: An optional list of strings representing the stop words.
|
|
- `streaming_callback`: An optional callable for handling streaming responses.
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
The chosen model should support tool/function calling, according to the model card.
|
|
Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
|
|
unexpected behavior.
|
|
|
|
<a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.to_dict"></a>
|
|
|
|
#### HuggingFaceAPIChatGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary containing the serialized component.
|
|
|
|
<a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.from_dict"></a>
|
|
|
|
#### HuggingFaceAPIChatGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
<a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run"></a>
|
|
|
|
#### HuggingFaceAPIChatGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
def run(messages: list[ChatMessage],
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None)
|
|
```
|
|
|
|
Invoke the text generation inference based on the provided messages and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage objects representing the input messages.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation.
|
|
- `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override
|
|
the `tools` parameter set during component initialization. This parameter can accept either a
|
|
list of `Tool` objects or a `Toolset` instance.
|
|
- `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
|
|
parameter set during component initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following keys:
|
|
- `replies`: A list containing the generated responses as ChatMessage objects.
|
|
|
|
<a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run_async"></a>
|
|
|
|
#### HuggingFaceAPIChatGenerator.run\_async
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
async def run_async(messages: list[ChatMessage],
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
streaming_callback: Optional[StreamingCallbackT] = None)
|
|
```
|
|
|
|
Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
|
|
|
|
This is the asynchronous version of the `run` method. It has the same parameters
|
|
and return values but can be used with `await` in an async code.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage objects representing the input messages.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation.
|
|
- `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
|
|
parameter set during component initialization. This parameter can accept either a list of `Tool` objects
|
|
or a `Toolset` instance.
|
|
- `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
|
|
parameter set during component initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following keys:
|
|
- `replies`: A list containing the generated responses as ChatMessage objects.
|
|
|
|
<a id="chat/openai"></a>
|
|
|
|
# Module chat/openai
|
|
|
|
<a id="chat/openai.OpenAIChatGenerator"></a>
|
|
|
|
## OpenAIChatGenerator
|
|
|
|
Completes chats using OpenAI's large language models (LLMs).
|
|
|
|
It works with the gpt-4 and o-series models and supports streaming responses
|
|
from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
|
|
format in input and output.
|
|
|
|
You can customize how the text is generated by passing parameters to the
|
|
OpenAI API. Use the `**generation_kwargs` argument when you initialize
|
|
the component or when you run it. Any parameter that works with
|
|
`openai.ChatCompletion.create` will work here too.
|
|
|
|
For details on OpenAI API parameters, see
|
|
[OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
|
|
|
|
### Usage example
|
|
|
|
```python
|
|
from haystack.components.generators.chat import OpenAIChatGenerator
|
|
from haystack.dataclasses import ChatMessage
|
|
|
|
messages = [ChatMessage.from_user("What's Natural Language Processing?")]
|
|
|
|
client = OpenAIChatGenerator()
|
|
response = client.run(messages)
|
|
print(response)
|
|
```
|
|
Output:
|
|
```
|
|
{'replies':
|
|
[ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
|
|
[TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
|
|
that focuses on enabling computers to understand, interpret, and generate human language in
|
|
a way that is meaningful and useful.")],
|
|
_name=None,
|
|
_meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
|
|
'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
|
|
]
|
|
}
|
|
```
|
|
|
|
<a id="chat/openai.OpenAIChatGenerator.__init__"></a>
|
|
|
|
#### OpenAIChatGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
|
|
model: str = "gpt-4o-mini",
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
api_base_url: Optional[str] = None,
|
|
organization: Optional[str] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
timeout: Optional[float] = None,
|
|
max_retries: Optional[int] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
tools_strict: bool = False,
|
|
http_client_kwargs: Optional[dict[str, Any]] = None)
|
|
```
|
|
|
|
Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini
|
|
|
|
Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
|
|
environment variables to override the `timeout` and `max_retries` parameters respectively
|
|
in the OpenAI client.
|
|
|
|
**Arguments**:
|
|
|
|
- `api_key`: The OpenAI API key.
|
|
You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
|
|
during initialization.
|
|
- `model`: The name of the model to use.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
|
|
as an argument.
|
|
- `api_base_url`: An optional base URL.
|
|
- `organization`: Your organization ID, defaults to `None`. See
|
|
[production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
|
|
- `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
|
|
the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
|
|
more details.
|
|
Some of the supported parameters:
|
|
- `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
|
|
including visible output tokens and reasoning tokens.
|
|
- `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
|
|
Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
|
|
- `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
|
|
considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
|
|
comprising the top 10% probability mass are considered.
|
|
- `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
|
|
it will generate two completions for each of the three prompts, ending up with 6 completions in total.
|
|
- `stop`: One or more sequences after which the LLM should stop generating tokens.
|
|
- `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
|
|
the model will be less likely to repeat the same token in the text.
|
|
- `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
|
|
Bigger values mean the model will be less likely to repeat the same token in the text.
|
|
- `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
|
|
values are the bias to add to that token.
|
|
- `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
|
|
If provided, the output will always be validated against this
|
|
format (unless the model returns a tool call).
|
|
For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
|
|
Notes:
|
|
- This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
|
|
Older models only support basic version of structured outputs through `{"type": "json_object"}`.
|
|
For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
|
|
- For structured outputs with streaming,
|
|
the `response_format` must be a JSON schema and not a Pydantic model.
|
|
- `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
|
|
`OPENAI_TIMEOUT` environment variable, or 30 seconds.
|
|
- `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
|
|
If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
- `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
|
|
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
|
|
- `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
|
|
For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
|
|
|
|
<a id="chat/openai.OpenAIChatGenerator.to_dict"></a>
|
|
|
|
#### OpenAIChatGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize this component to a dictionary.
|
|
|
|
**Returns**:
|
|
|
|
The serialized component as a dictionary.
|
|
|
|
<a id="chat/openai.OpenAIChatGenerator.from_dict"></a>
|
|
|
|
#### OpenAIChatGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator"
|
|
```
|
|
|
|
Deserialize this component from a dictionary.
|
|
|
|
**Arguments**:
|
|
|
|
- `data`: The dictionary representation of this component.
|
|
|
|
**Returns**:
|
|
|
|
The deserialized component instance.
|
|
|
|
<a id="chat/openai.OpenAIChatGenerator.run"></a>
|
|
|
|
#### OpenAIChatGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
def run(messages: list[ChatMessage],
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
*,
|
|
tools: Optional[ToolsType] = None,
|
|
tools_strict: Optional[bool] = None)
|
|
```
|
|
|
|
Invokes chat completion based on the provided messages and generation parameters.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage instances representing the input messages.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
|
|
override the parameters passed during component initialization.
|
|
For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
If set, it will override the `tools` parameter provided during initialization.
|
|
- `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
|
|
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
|
|
If set, it will override the `tools_strict` parameter set during component initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following key:
|
|
- `replies`: A list containing the generated responses as ChatMessage instances.
|
|
|
|
<a id="chat/openai.OpenAIChatGenerator.run_async"></a>
|
|
|
|
#### OpenAIChatGenerator.run\_async
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage])
|
|
async def run_async(messages: list[ChatMessage],
|
|
streaming_callback: Optional[StreamingCallbackT] = None,
|
|
generation_kwargs: Optional[dict[str, Any]] = None,
|
|
*,
|
|
tools: Optional[ToolsType] = None,
|
|
tools_strict: Optional[bool] = None)
|
|
```
|
|
|
|
Asynchronously invokes chat completion based on the provided messages and generation parameters.
|
|
|
|
This is the asynchronous version of the `run` method. It has the same parameters and return values
|
|
but can be used with `await` in async code.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: A list of ChatMessage instances representing the input messages.
|
|
- `streaming_callback`: A callback function that is called when a new token is received from the stream.
|
|
Must be a coroutine.
|
|
- `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
|
|
override the parameters passed during component initialization.
|
|
For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
|
|
If set, it will override the `tools` parameter provided during initialization.
|
|
- `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
|
|
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
|
|
If set, it will override the `tools_strict` parameter set during component initialization.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with the following key:
|
|
- `replies`: A list containing the generated responses as ChatMessage instances.
|
|
|
|
<a id="chat/fallback"></a>
|
|
|
|
# Module chat/fallback
|
|
|
|
<a id="chat/fallback.FallbackChatGenerator"></a>
|
|
|
|
## FallbackChatGenerator
|
|
|
|
A chat generator wrapper that tries multiple chat generators sequentially.
|
|
|
|
It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
|
|
Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
|
|
If all chat generators fail, it raises a RuntimeError with details.
|
|
|
|
Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
|
|
work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
|
|
when timeouts occur. For predictable latency guarantees, ensure your chat generators:
|
|
- Support a `timeout` parameter in their initialization
|
|
- Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
|
|
- Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
|
|
|
|
Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
|
|
with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
|
|
typically applies to all connection phases: connection setup, read, write, and pool. For streaming
|
|
responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
|
|
receiving the complete response.
|
|
|
|
Failover is automatically triggered when a generator raises any exception, including:
|
|
- Timeout errors (if the generator implements and raises them)
|
|
- Rate limit errors (429)
|
|
- Authentication errors (401)
|
|
- Context length errors (400)
|
|
- Server errors (500+)
|
|
- Any other exception
|
|
|
|
<a id="chat/fallback.FallbackChatGenerator.__init__"></a>
|
|
|
|
#### FallbackChatGenerator.\_\_init\_\_
|
|
|
|
```python
|
|
def __init__(chat_generators: list[ChatGenerator])
|
|
```
|
|
|
|
Creates an instance of FallbackChatGenerator.
|
|
|
|
**Arguments**:
|
|
|
|
- `chat_generators`: A non-empty list of chat generator components to try in order.
|
|
|
|
<a id="chat/fallback.FallbackChatGenerator.to_dict"></a>
|
|
|
|
#### FallbackChatGenerator.to\_dict
|
|
|
|
```python
|
|
def to_dict() -> dict[str, Any]
|
|
```
|
|
|
|
Serialize the component, including nested chat generators when they support serialization.
|
|
|
|
<a id="chat/fallback.FallbackChatGenerator.from_dict"></a>
|
|
|
|
#### FallbackChatGenerator.from\_dict
|
|
|
|
```python
|
|
@classmethod
|
|
def from_dict(cls, data: dict[str, Any]) -> FallbackChatGenerator
|
|
```
|
|
|
|
Rebuild the component from a serialized representation, restoring nested chat generators.
|
|
|
|
<a id="chat/fallback.FallbackChatGenerator.run"></a>
|
|
|
|
#### FallbackChatGenerator.run
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
|
|
def run(
|
|
messages: list[ChatMessage],
|
|
generation_kwargs: Union[dict[str, Any], None] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
streaming_callback: Union[StreamingCallbackT,
|
|
None] = None) -> dict[str, Any]
|
|
```
|
|
|
|
Execute chat generators sequentially until one succeeds.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: The conversation history as a list of ChatMessage instances.
|
|
- `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens).
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
|
|
- `streaming_callback`: Optional callable for handling streaming responses.
|
|
|
|
**Raises**:
|
|
|
|
- `RuntimeError`: If all chat generators fail.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with:
|
|
- "replies": Generated ChatMessage instances from the first successful generator.
|
|
- "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
|
|
total_attempts, failed_chat_generators, plus any metadata from the successful generator.
|
|
|
|
<a id="chat/fallback.FallbackChatGenerator.run_async"></a>
|
|
|
|
#### FallbackChatGenerator.run\_async
|
|
|
|
```python
|
|
@component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
|
|
async def run_async(
|
|
messages: list[ChatMessage],
|
|
generation_kwargs: Union[dict[str, Any], None] = None,
|
|
tools: Optional[ToolsType] = None,
|
|
streaming_callback: Union[StreamingCallbackT,
|
|
None] = None) -> dict[str, Any]
|
|
```
|
|
|
|
Asynchronously execute chat generators sequentially until one succeeds.
|
|
|
|
**Arguments**:
|
|
|
|
- `messages`: The conversation history as a list of ChatMessage instances.
|
|
- `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens).
|
|
- `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
|
|
- `streaming_callback`: Optional callable for handling streaming responses.
|
|
|
|
**Raises**:
|
|
|
|
- `RuntimeError`: If all chat generators fail.
|
|
|
|
**Returns**:
|
|
|
|
A dictionary with:
|
|
- "replies": Generated ChatMessage instances from the first successful generator.
|
|
- "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
|
|
total_attempts, failed_chat_generators, plus any metadata from the successful generator.
|
|
|