These changes are needed because there is currently no way to get
logging information about Streaming LLM requests/responses.
I decided to put the StreamStart event AFTER the first chunk so there
aren't false positives about connections/auth.
Closes#5730
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
This pull request introduces the integration of the `llama-cpp` library
into the `autogen-ext` package, with significant changes to the project
dependencies and the implementation of a new chat completion client. The
most important changes include updating the project dependencies, adding
a new module for the `LlamaCppChatCompletionClient`, and implementing
the client with various functionalities.
### Project Dependencies:
*
[`python/packages/autogen-ext/pyproject.toml`](diffhunk://#diff-095119d4420ff09059557bd25681211d1772c2be0fbe0ff2d551a3726eff1b4bR34-R38):
Added `llama-cpp-python` as a new dependency under the `llama-cpp`
section.
### New Module:
*
[`python/packages/autogen-ext/src/autogen_ext/models/llama_cpp/__init__.py`](diffhunk://#diff-42ae3ba17d51ca917634c4ea3c5969cf930297c288a783f8d9c126f2accef71dR1-R8):
Introduced the `LlamaCppChatCompletionClient` class and handled import
errors with a descriptive message for missing dependencies.
### Implementation of `LlamaCppChatCompletionClient`:
*
`python/packages/autogen-ext/src/autogen_ext/models/llama_cpp/_llama_cpp_completion_client.py`:
- Added the `LlamaCppChatCompletionClient` class with methods to
initialize the client, create chat completions, detect and execute
tools, and handle streaming responses.
- Included detailed logging for debugging purposes and implemented
methods to count tokens, track usage, and provide model information.…d
chat capabilities
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
<!-- Please give a short summary of the change and the problem this
solves. -->
## Related issue number
<!-- For example: "Closes #1234" -->
## Checks
- [X ] I've included any doc changes needed for
https://microsoft.github.io/autogen/. See
https://microsoft.github.io/autogen/docs/Contribute#documentation to
build and test documentation locally.
- [X ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ X] I've made sure all auto checks have passed.
---------
Co-authored-by: aribornstein <x@x.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
Co-authored-by: Ryan Sweet <rysweet@microsoft.com>
Resolves#5745
Also made sure to log LLMCallEvent from all builtin model clients, and
added unit test for coverage.
---------
Co-authored-by: Ryan Sweet <rysweet@microsoft.com>
Co-authored-by: Victor Dibia <victordibia@microsoft.com>
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
<!-- Please give a short summary of the change and the problem this
solves. -->
The PR introduces two changes.
The first change is adding a name attribute to
`FunctionExecutionResult`. The motivation is that semantic kernel
requires it for their function result interface and it seemed like a
easy modification as `FunctionExecutionResult` is always created in the
context of a `FunctionCall` which will contain the name. I'm unsure if
there was a motivation to keep it out but this change makes it easier to
trace which tool the result refers to and also increases api
compatibility with SK.
The second change is an update to how messages are mapped from autogen
to semantic kernel, which includes an update/fix in the processing of
function results.
## Related issue number
<!-- For example: "Closes #1234" -->
Related to #5675 but wont fix the underlying issue of anthropic
requiring tools during AssistantAgent reflection.
## Checks
- [ ] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ ] I've made sure all auto checks have passed.
---------
Co-authored-by: Leonardo Pinheiro <lpinheiro@microsoft.com>
@ekzhu should likely be assigned as reviewer
## Why are these changes needed?
These changes address the bug reported in #5663. Prevents TypeError from
being thrown at inference time by ollama AsyncClient when `host` (and
other) kwargs are passed to autogen OllamaChatCompletionClient
constructor.
It also adds ollama as a named optional extra so that the ollama
requirements can be installed alongside autogen-ext (e.g. `pip install
autogen-ext[ollama]`
@ekzhu, I will need some help or guidance to ensure that the associated
test (which requires ollama and tiktoken as dependencies of the
OllamaChatCompletionClient) can run successfully in autogen's test
execution environment.
I have also left the "I've made sure all auto checks have passed" check
below unchecked as this PR is coming from my fork. (UPDATE: auto checks
appear to have passed after opening PR, so I have checked box below)
## Related issue number
Intended to close#5663
## Checks
- [x] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [x] I've made sure all auto checks have passed.
---------
Co-authored-by: Ryan Stewart <ryanstewart@Ryans-MacBook-Pro.local>
Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>
Co-authored-by: peterychang <49209570+peterychang@users.noreply.github.com>
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->
Claude 3.7 just came out. Its a pretty capable model and it would be
great to support it in Autogen.
This will could augment the already excellent support we have for
Anthropic via the SKAdapters in the following ways
- Based on the ChatCompletion API similar to the ollama and openai
client
- Configurable/serializable (can be dumped) .. this means it can be used
easily in AGS.
## What is Supported
(video below shows the client being used in autogen studio)
https://github.com/user-attachments/assets/8fb7c17c-9f9c-4525-aa9c-f256aad0f40b
- streaming
- tool callign / function calling
- drop in integration with assistant agent.
- multimodal support
```python
from dotenv import load_dotenv
import os
load_dotenv()
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
model_client = AnthropicChatCompletionClient(
model="claude-3-7-sonnet-20250219"
)
async def get_weather(city: str) -> str:
"""Get the weather for a given city."""
return f"The weather in {city} is 73 degrees and Sunny."
agent = AssistantAgent(
name="weather_agent",
model_client=model_client,
tools=[get_weather],
system_message="You are a helpful assistant.",
# model_client_stream=True,
)
# Run the agent and stream the messages to the console.
async def main() -> None:
await Console(agent.run_stream(task="What is the weather in New York?"))
await main()
```
result
```
messages = [
UserMessage(content="Write a very short story about a dragon.", source="user"),
]
# Create a stream.
stream = model_client.create_stream(messages=messages)
# Iterate over the stream and print the responses.
print("Streamed responses:")
async for response in stream: # type: ignore
if isinstance(response, str):
# A partial response is a string.
print(response, flush=True, end="")
else:
# The last response is a CreateResult object with the complete message.
print("\n\n------------\n")
print("The complete response:", flush=True)
print(response.content, flush=True)
print("\n\n------------\n")
print("The token usage was:", flush=True)
print(response.usage, flush=True)
```
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
<!-- Please give a short summary of the change and the problem this
solves. -->
## Related issue number
<!-- For example: "Closes #1234" -->
Closes#5205Closes#5708
## Checks
- [ ] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ ] I've made sure all auto checks have passed.
cc @rohanthacker
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->
This PR makes makes ChatCompletionCache support component config
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
Ensures we have a path to serializing ChatCompletionCache , similar to
the ChatCompletion client that it wraps.
This PR does the following
- Makes CacheStore serializable first (part of this includes converting
from Protocol to base class). Makes it's derivatives serializable as
well (diskcache, redis)
- Makes ChatCompletionCache serializable
- Adds some tests
<!-- Please give a short summary of the change and the problem this
solves. -->
## Related issue number
<!-- For example: "Closes #1234" -->
Closes#5141
## Checks
- [ ] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ ] I've made sure all auto checks have passed.
cc @nour-bouzid
Resolves#5192
Test
```python
import asyncio
import os
from random import randint
from typing import List
from autogen_core.tools import BaseTool, FunctionTool
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
async def get_current_time(city: str) -> str:
return f"The current time in {city} is {randint(0, 23)}:{randint(0, 59)}."
tools: List[BaseTool] = [
FunctionTool(
get_current_time,
name="get_current_time",
description="Get current time for a city.",
),
]
model_client = OpenAIChatCompletionClient(
model="anthropic/claude-3.5-haiku-20241022",
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
model_info={
"family": "claude-3.5-haiku",
"function_calling": True,
"vision": False,
"json_output": False,
}
)
agent = AssistantAgent(
name="Agent",
model_client=model_client,
tools=tools,
system_message= "You are an assistant with some tools that can be used to answer some questions",
)
async def main() -> None:
await Console(agent.run_stream(task="What is current time of Paris and Toronto?"))
asyncio.run(main())
```
```
---------- user ----------
What is current time of Paris and Toronto?
---------- Agent ----------
I'll help you find the current time for Paris and Toronto by using the get_current_time function for each city.
---------- Agent ----------
[FunctionCall(id='toolu_01NwP3fNAwcYKn1x656Dq9xW', arguments='{"city": "Paris"}', name='get_current_time'), FunctionCall(id='toolu_018d4cWSy3TxXhjgmLYFrfRt', arguments='{"city": "Toronto"}', name='get_current_time')]
---------- Agent ----------
[FunctionExecutionResult(content='The current time in Paris is 1:10.', call_id='toolu_01NwP3fNAwcYKn1x656Dq9xW', is_error=False), FunctionExecutionResult(content='The current time in Toronto is 7:28.', call_id='toolu_018d4cWSy3TxXhjgmLYFrfRt', is_error=False)]
---------- Agent ----------
The current time in Paris is 1:10.
The current time in Toronto is 7:28.
```
---------
Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
<!-- Please give a short summary of the change and the problem this
solves. -->
The current stream processing of SK model adapter returns on the first
function call chunk but this behavior is incorrect end ends up returning
with an incomplete function call. The observed behavior is that the
function name and arguments are split into different chunks and this
update correctly processes the chunks in this way.
## Related issue number
<!-- For example: "Closes #1234" -->
Fixes the reply in #5420
## Checks
- [ ] I've included any doc changes needed for
https://microsoft.github.io/autogen/. See
https://microsoft.github.io/autogen/docs/Contribute#documentation to
build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ ] I've made sure all auto checks have passed.
---------
Co-authored-by: Leonardo Pinheiro <lpinheiro@microsoft.com>
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
<!-- Please give a short summary of the change and the problem this
solves. -->
Semantic kernel prepends the plugin name to the tool name when passing
the tools to model clients and this is causing a mismatch between tool
names in SK and the AssistantAgent. Since plugin names are optional, we
have opted to remove it.
## Related issue number
<!-- For example: "Closes #1234" -->
Closes#5420
## Checks
- [ ] I've included any doc changes needed for
https://microsoft.github.io/autogen/. See
https://microsoft.github.io/autogen/docs/Contribute#documentation to
build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ ] I've made sure all auto checks have passed.
---------
Co-authored-by: Leonardo Pinheiro <lpinheiro@microsoft.com>
Resolves#3983
* introduce `model_client_stream` parameter in `AssistantAgent` to
enable token-level streaming output.
* introduce `ModelClientStreamingChunkEvent` as a type of `AgentEvent`
to pass the streaming chunks to the application via `run_stream` and
`on_messages_stream`. Although this will not affect the inner messages
list in the final `Response` or `TaskResult`.
* handle this new message type in `Console`.
* Rebase to latest main branch
* Moved _azure module to azure
* Validate extra_create_args in and json response
* Added Support for Github Models
* Added normalize_name and assert_valid name
* Added Tests for AzureAIChatCompletionClient
* WIP: Azure AI Client
* Added: object-level usage data
* Added: doc string
* Added: check existing response_format value
* Added: _validate_config and _create_client
* lint
* merge dependencies
* add tests for img and function calling
* support actual tests through env vars
* address mypy errors
* doc example fix
* fmt
* fix doc fmt
* Update python/packages/autogen-ext/src/autogen_ext/models/azure/_azure_ai_client.py
---------
Co-authored-by: Rohan Thacker <thackerrohan4@gmail.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
Co-authored-by: Leonardo Pinheiro <lpinheiro@microsoft.com>
* Add ChatCompletionCache along with AbstractStore for caching completions
* Addressing comments
* Improve interface for cachestore
* Improve documentation & revert protocol
* Make cache store typed, and improve docs
* remove unnecessary casts
1. convert dataclass types to pydantic basemodel
2. add save_state and load_state for ChatAgent
3. state types for AgentChat
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>