
## Why are these changes needed? This PR adds an example demonstrates how to build a streaming chat API with multi-turn conversation history using `autogen-core` and FastAPI. ## Related issue number ## Checks - [x] I've included any doc changes needed for <https://microsoft.github.io/autogen/>. See <https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to build and test documentation locally. - [x] I've added tests (if relevant) corresponding to the changes introduced in this PR. - [x] I've made sure all auto checks have passed. --------- Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
AutoGen-Core Streaming Chat API with FastAPI
This sample demonstrates how to build a streaming chat API with multi-turn conversation history using autogen-core
and FastAPI.
Key Features
- Streaming Response: Implements real-time streaming of LLM responses by utilizing FastAPI's
StreamingResponse
,autogen-core
's asynchronous features, and a global queue created withasyncio.Queue()
to manage the data stream, thereby providing faster user-perceived response times. - Multi-Turn Conversation: The Agent (
MyAgent
) can receive and process chat history records (ChatHistory
) containing multiple turns of interaction, enabling context-aware continuous conversations.
File Structure
app.py
: FastAPI application code, including API endpoints, Agent definitions, runtime settings, and streaming logic.README.md
: (This document) Project introduction and usage instructions.
Installation
First, make sure you have Python installed (recommended 3.8 or higher). Then, in your project directory, install the necessary libraries via pip:
pip install "fastapi" "uvicorn[standard]" "autogen-core" "autogen-ext[openai]"
Configuration
Create a new file named model_config.yaml
in the same directory as this README file to configure your model settings.
See model_config_template.yaml
for an example.
Note: Hardcoding API keys directly in the code is only suitable for local testing. For production environments, it is strongly recommended to use environment variables or other secure methods to manage keys.
Running the Application
In the directory containing app.py
, run the following command to start the FastAPI application:
uvicorn app:app --host 0.0.0.0 --port 8501 --reload
After the service starts, the API endpoint will be available at http://<your-server-ip>:8501/chat/completions
.
Using the API
You can interact with the Agent by sending a POST request to the /chat/completions
endpoint. The request body must be in JSON format and contain a messages
field, the value of which is a list, where each element represents a turn of conversation.
Request Body Format:
{
"messages": [
{"source": "user", "content": "Hello!"},
{"source": "assistant", "content": "Hello! How can I help you?"},
{"source": "user", "content": "Introduce yourself."}
]
}
Example (using curl):
curl -N -X POST http://localhost:8501/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"source": "user", "content": "Hello, I'\''m Tory."},
{"source": "assistant", "content": "Hello Tory, nice to meet you!"},
{"source": "user", "content": "Say hello by my name and introduce yourself."}
]
}'
Example (using Python requests):
import requests
import json
url = "http://localhost:8501/chat/completions"
data = {
'stream': True,
'messages': [
{'source': 'user', 'content': "Hello,I'm tory."},
{'source': 'assistant', 'content':"hello Tory, nice to meet you!"},
{'source': 'user', 'content': "Say hello by my name and introduce yourself."}
]
}
headers = {'Content-Type': 'application/json'}
try:
response = requests.post(url, json=data, headers=headers, stream=True)
response.raise_for_status()
for chunk in response.iter_content(chunk_size=None):
if chunk:
print(json.loads(chunk)["content"], end='', flush=True)
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
except json.JSONDecodeError as e:
print(f"JSON Decode Error: {e}")