Haystack Bot a471fbfebe
Promote unstable docs for Haystack 2.21 (#10204)
Co-authored-by: vblagoje <458335+vblagoje@users.noreply.github.com>
2025-12-08 20:09:00 +01:00

148 lines
5.6 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "LlamaStackChatGenerator"
id: llamastackchatgenerator
slug: "/llamastackchatgenerator"
description: "This component enables chat completions using any model made available by inference providers on a Llama Stack server."
---
# LlamaStackChatGenerator
This component enables chat completions using any model made available by inference providers on a Llama Stack server.
<div className="key-value-table">
| | |
| --- | --- |
| **Most common position in a pipeline** | After a [ChatPromptBuilder](../builders/chatpromptbuilder.mdx) |
| **Mandatory init variables** | `model`: The name of the model to use for chat completion. <br />This depends on the inference provider used for the Llama Stack Server. |
| **Mandatory run variables** | `messages`: A list of [`ChatMessage`](../../concepts/data-classes/chatmessage.mdx) objects representing the chat |
| **Output variables** | `replies`: A list of alternative replies of the model to the input chat |
| **API reference** | [Llama Stack](/reference/integrations-llama-stack) |
| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/llama_stack |
</div>
## Overview
[Llama Stack](https://llama-stack.readthedocs.io/en/latest/index.html) provides building blocks and unified APIs to streamline the development of AI applications across various environments.
The `LlamaStackChatGenerator` enables you to access any LLMs exposed by inference providers hosted on a Llama Stack server. It abstracts away the underlying provider details, allowing you to reuse the same client-side code regardless of the inference backend. For a list of supported providers and configuration options, refer to the [Llama Stack documentation](https://llama-stack.readthedocs.io/en/latest/providers/inference/index.html).
This component uses the same `ChatMessage` format as other Haystack Chat Generators for structured input and output. For more information, see the [ChatMessage documentation](../../concepts/data-classes/chatmessage.mdx).
### Tool Support
`LlamaStackChatGenerator` supports function calling through the `tools` parameter, which accepts flexible tool configurations:
- **A list of Tool objects**: Pass individual tools as a list
- **A single Toolset**: Pass an entire Toolset directly
- **Mixed Tools and Toolsets**: Combine multiple Toolsets with standalone tools in a single list
This allows you to organize related tools into logical groups while also including standalone tools as needed.
```python
from haystack.tools import Tool, Toolset
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
# Create individual tools
weather_tool = Tool(name="weather", description="Get weather info", ...)
news_tool = Tool(name="news", description="Get latest news", ...)
# Group related tools into a toolset
math_toolset = Toolset([add_tool, subtract_tool, multiply_tool])
# Pass mixed tools and toolsets to the generator
generator = LlamaStackChatGenerator(
model="ollama/llama3.2:3b",
tools=[math_toolset, weather_tool, news_tool] # Mix of Toolset and Tool objects
)
```
For more details on working with tools, see the [Tool](../../tools/tool.mdx) and [Toolset](../../tools/toolset.mdx) documentation.
## Initialization
To use this integration, you must have:
- A running instance of a Llama Stack server (local or remote)
- A valid model name supported by your selected inference provider
Then initialize the `LlamaStackChatGenerator` by specifying the `model` name or ID. The value depends on the inference provider running on your server.
**Examples:**
- For Ollama: `model="ollama/llama3.2:3b"`
- For vLLM: `model="meta-llama/Llama-3.2-3B"`
**Note:** Switching the inference provider only requires updating the model name.
### Streaming
This Generator supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) the tokens from the LLM directly in output. To do so, pass a function to the `streaming_callback` init parameter.
## Usage
To start using this integration, install the package with:
```shell
pip install llama-stack-haystack
```
### On its own
```python
import os
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
client = LlamaStackChatGenerator(model="ollama/llama3.2:3b")
response = client.run(
[ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"])
```
#### With Streaming
```python
import os
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
from haystack.components.generators.utils import print_streaming_chunk
client = LlamaStackChatGenerator(model="ollama/llama3.2:3b",
streaming_callback=print_streaming_chunk)
response = client.run(
[ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"])
```
### In a pipeline
```python
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator
prompt_builder = ChatPromptBuilder()
llm = LlamaStackChatGenerator(model="ollama/llama3.2:3b")
pipe = Pipeline()
pipe.add_component("builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("builder.prompt", "llm.messages")
messages = [
ChatMessage.from_system("Give brief answers."),
ChatMessage.from_user("Tell me about {{city}}")
]
response = pipe.run(
data={"builder": {"template": messages,
"template_variables": {"city": "Berlin"}}}
)
print(response)
```