mirror of https://github.com/microsoft/autogen.git synced 2025-12-30 08:38:20 +00:00

History

Add an example using autogen-core and FastAPI to create streaming responses (#6335 )

## Why are these changes needed?

This PR adds an example demonstrates how to build a streaming chat API
with multi-turn conversation history using `autogen-core` and FastAPI.

## Related issue number


## Checks

- [x] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [x] I've made sure all auto checks have passed.

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>

2025-04-21 23:55:03 +00:00

.gitignore

Add an example using autogen-core and FastAPI to create streaming responses (#6335 )

2025-04-21 23:55:03 +00:00

app.py

Add an example using autogen-core and FastAPI to create streaming responses (#6335 )

2025-04-21 23:55:03 +00:00

model_config_template.yaml

Add an example using autogen-core and FastAPI to create streaming responses (#6335 )

2025-04-21 23:55:03 +00:00

README.md

Add an example using autogen-core and FastAPI to create streaming responses (#6335 )

2025-04-21 23:55:03 +00:00

README.md

AutoGen-Core Streaming Chat API with FastAPI

This sample demonstrates how to build a streaming chat API with multi-turn conversation history using autogen-core and FastAPI.

Key Features

Streaming Response: Implements real-time streaming of LLM responses by utilizing FastAPI's StreamingResponse, autogen-core's asynchronous features, and a global queue created with asyncio.Queue() to manage the data stream, thereby providing faster user-perceived response times.
Multi-Turn Conversation: The Agent (MyAgent) can receive and process chat history records (ChatHistory) containing multiple turns of interaction, enabling context-aware continuous conversations.

File Structure

app.py: FastAPI application code, including API endpoints, Agent definitions, runtime settings, and streaming logic.
README.md: (This document) Project introduction and usage instructions.

Installation

First, make sure you have Python installed (recommended 3.8 or higher). Then, in your project directory, install the necessary libraries via pip:

pip install "fastapi" "uvicorn[standard]" "autogen-core" "autogen-ext[openai]"

Configuration

Create a new file named model_config.yaml in the same directory as this README file to configure your model settings. See model_config_template.yaml for an example.

Note: Hardcoding API keys directly in the code is only suitable for local testing. For production environments, it is strongly recommended to use environment variables or other secure methods to manage keys.

Running the Application

In the directory containing app.py, run the following command to start the FastAPI application:

uvicorn app:app --host 0.0.0.0 --port 8501 --reload

After the service starts, the API endpoint will be available at http://<your-server-ip>:8501/chat/completions.

Using the API

You can interact with the Agent by sending a POST request to the /chat/completions endpoint. The request body must be in JSON format and contain a messages field, the value of which is a list, where each element represents a turn of conversation.

Request Body Format:

{
  "messages": [
    {"source": "user", "content": "Hello!"},
    {"source": "assistant", "content": "Hello! How can I help you?"},
    {"source": "user", "content": "Introduce yourself."}
  ]
}

Example (using curl):

curl -N -X POST http://localhost:8501/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {"source": "user", "content": "Hello, I'\''m Tory."},
    {"source": "assistant", "content": "Hello Tory, nice to meet you!"},
    {"source": "user", "content": "Say hello by my name and introduce yourself."}
  ]
}'

Example (using Python requests):

import requests
import json
url = "http://localhost:8501/chat/completions"
data = {
    'stream': True,
    'messages': [
            {'source': 'user', 'content': "Hello,I'm tory."},
            {'source': 'assistant', 'content':"hello Tory, nice to meet you!"},
            {'source': 'user', 'content': "Say hello by my name and introduce yourself."}
        ]
    }
headers = {'Content-Type': 'application/json'}
try:
    response = requests.post(url, json=data, headers=headers, stream=True)
    response.raise_for_status()
    for chunk in response.iter_content(chunk_size=None):
        if chunk:
            print(json.loads(chunk)["content"], end='', flush=True)

except requests.exceptions.RequestException as e:
    print(f"Error: {e}")
except json.JSONDecodeError as e:
    print(f"JSON Decode Error: {e}")