mirror of
https://github.com/microsoft/autogen.git
synced 2025-07-25 01:41:01 +00:00

* update * update * Update notebook/oai_client_cost.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * update doc and test --------- Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> Co-authored-by: Chi Wang <wang.chi@microsoft.com>
526 lines
19 KiB
Plaintext
526 lines
19 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<a href=\"https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/oai_client_cost.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
|
"\n",
|
|
"Licensed under the MIT License.\n",
|
|
"\n",
|
|
"# Usage tracking with AtuoGen\n",
|
|
"## 1. Use AutoGen's OpenAIWrapper for cost estimation\n",
|
|
"The `OpenAIWrapper` from `autogen` tracks token counts and costs of your API calls. Use the `create()` method to initiate requests and `print_usage_summary()` to retrieve a detailed usage report, including total cost and token usage for both cached and actual requests.\n",
|
|
"\n",
|
|
"- `mode=[\"actual\", \"total\"]` (default): print usage summary for non-caching completions and all completions (including cache).\n",
|
|
"- `mode='actual'`: only print non-cached usage.\n",
|
|
"- `mode='total'`: only print all usage (including cache).\n",
|
|
"\n",
|
|
"Reset your session's usage data with `clear_usage_summary()` when needed.\n",
|
|
"\n",
|
|
"## 2. Track cost and token count for agents\n",
|
|
"We also support cost estimation for agents. Use `Agent.print_usage_summary()` to print the cost summary for the agent.\n",
|
|
"You can retrieve usage summary in a dict using `Agent.get_actual_usage()` and `Agent.get_total_usage()`. Note that `Agent.reset()` will also reset the usage summary.\n",
|
|
"\n",
|
|
"To gather usage data for a list of agents, we provide an utility function `autogen.agent_utils.gather_usage_summary(agents)` where you pass in a list of agents and gather the usage summary.\n",
|
|
"\n",
|
|
"## Caution when using Azure OpenAI!\n",
|
|
"If you are using azure OpenAI, the model returned from completion doesn't have the version information. The returned model is either 'gpt-35-turbo' or 'gpt-4'. From there, we are calculating the cost based on gpt-3.5-0613: ((0.0015, 0.002) per 1k prompt and completion tokens) and gpt-4-0613: (0.03,0.06). This means the cost is wrong if you are using the 1106 version of the models from azure OpenAI.\n",
|
|
"\n",
|
|
"This will be improved in the future. However, the token count summary is accurate. You can use the token count to calculate the cost yourself.\n",
|
|
"\n",
|
|
"## Requirements\n",
|
|
"\n",
|
|
"AutoGen requires `Python>=3.8`:\n",
|
|
"```bash\n",
|
|
"pip install \"pyautogen\"\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set your API Endpoint\n",
|
|
"\n",
|
|
"The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import autogen\n",
|
|
"from autogen import OpenAIWrapper\n",
|
|
"from autogen import AssistantAgent, UserProxyAgent\n",
|
|
"from autogen.agent_utils import gather_usage_summary\n",
|
|
"\n",
|
|
"# config_list = autogen.config_list_from_json(\n",
|
|
"# \"OAI_CONFIG_LIST\",\n",
|
|
"# filter_dict={\n",
|
|
"# \"model\": [\"gpt-3.5-turbo\", \"gpt-4-1106-preview\"],\n",
|
|
"# },\n",
|
|
"# )\n",
|
|
"\n",
|
|
"config_list = autogen.config_list_from_json(\n",
|
|
" \"OAI_CONFIG_LIST\",\n",
|
|
" filter_dict={\n",
|
|
" \"model\": [\"gpt-3.5-turbo\", \"gpt-35-turbo\"],\n",
|
|
" },\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"It first looks for environment variable \"OAI_CONFIG_LIST\" which needs to be a valid json string. If that variable is not found, it then looks for a json file named \"OAI_CONFIG_LIST\". It filters the configs by models (you can filter by other keys as well).\n",
|
|
"\n",
|
|
"The config list looks like the following:\n",
|
|
"```python\n",
|
|
"config_list = [\n",
|
|
" {\n",
|
|
" \"model\": \"gpt-4\",\n",
|
|
" \"api_key\": \"<your OpenAI API key>\",\n",
|
|
" }, # OpenAI API endpoint for gpt-4\n",
|
|
" {\n",
|
|
" \"model\": \"gpt-35-turbo-0613\", # 0613 or newer is needed to use functions\n",
|
|
" \"base_url\": \"<your Azure OpenAI API base>\", \n",
|
|
" \"api_type\": \"azure\", \n",
|
|
" \"api_version\": \"2023-08-01-preview\", # 2023-07-01-preview or newer is needed to use functions\n",
|
|
" \"api_key\": \"<your Azure OpenAI API key>\"\n",
|
|
" }\n",
|
|
"]\n",
|
|
"```\n",
|
|
"\n",
|
|
"You can set the value of config_list in any way you prefer. Please refer to this [notebook](https://github.com/microsoft/autogen/blob/main/notebook/oai_openai_utils.ipynb) for full code examples of the different methods."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## OpenAIWrapper with cost estimation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"0.0003535\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"client = OpenAIWrapper(config_list=config_list)\n",
|
|
"messages = [\n",
|
|
" {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
|
|
"]\n",
|
|
"response = client.create(messages=messages, model=\"gpt-3.5-turbo\", cache_seed=None)\n",
|
|
"print(response.cost)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Usage Summary for OpenAIWrapper\n",
|
|
"\n",
|
|
"When creating a instance of OpenAIWrapper, cost of all completions from the same instance is recorded. You can call `print_usage_summary()` to checkout your usage summary. To clear up, use `clear_usage_summary()`.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"No usage summary. Please call \"create\" first.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"client = OpenAIWrapper(config_list=config_list)\n",
|
|
"messages = [\n",
|
|
" {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
|
|
"]\n",
|
|
"client.print_usage_summary() # print usage summary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"In update_usage_summary\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00026\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
|
|
"\n",
|
|
"All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00026\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary including cached usage: \n",
|
|
"Total cost: 0.00026\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# The first creation\n",
|
|
"# By default, cache_seed is set to 41 and enabled. If you don't want to use cache, set cache_seed to None.\n",
|
|
"response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=41)\n",
|
|
"client.print_usage_summary() # default to [\"actual\", \"total\"]\n",
|
|
"client.print_usage_summary(mode=\"actual\") # print actual usage summary\n",
|
|
"client.print_usage_summary(mode=\"total\") # print total usage summary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{'total_cost': 0.0002575, 'gpt-35-turbo': {'cost': 0.0002575, 'prompt_tokens': 25, 'completion_tokens': 110, 'total_tokens': 135}}\n",
|
|
"{'total_cost': 0.0002575, 'gpt-35-turbo': {'cost': 0.0002575, 'prompt_tokens': 25, 'completion_tokens': 110, 'total_tokens': 135}}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# take out cost\n",
|
|
"print(client.actual_usage_summary)\n",
|
|
"print(client.total_usage_summary)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"In update_usage_summary\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00026\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
|
|
"\n",
|
|
"Usage summary including cached usage: \n",
|
|
"Total cost: 0.00052\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00052, prompt_tokens: 50, completion_tokens: 220, total_tokens: 270\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Since cache is enabled, the same completion will be returned from cache, which will not incur any actual cost.\n",
|
|
"# So actual cost doesn't change but total cost doubles.\n",
|
|
"response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=41)\n",
|
|
"client.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"No usage summary. Please call \"create\" first.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# clear usage summary\n",
|
|
"client.clear_usage_summary()\n",
|
|
"client.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"In update_usage_summary\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"No actual cost incurred (all completions are using cache).\n",
|
|
"\n",
|
|
"Usage summary including cached usage: \n",
|
|
"Total cost: 0.00026\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# all completions are returned from cache, so no actual cost incurred.\n",
|
|
"response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=41)\n",
|
|
"client.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Usage Summary for Agents\n",
|
|
"\n",
|
|
"- `Agent.print_usage_summary()` will print the cost summary for the agent.\n",
|
|
"- `Agent.get_actual_usage()` and `Agent.get_total_usage()` will return the usage summary in a dict. When an agent doesn't use LLM, they will return None.\n",
|
|
"- `Agent.reset()` will reset the usage summary.\n",
|
|
"- `autogen.agent_utils.gather_usage_summary` will gather the usage summary for a list of agents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\u001b[33mai_user\u001b[0m (to assistant):\n",
|
|
"\n",
|
|
"$x^3=125$. What is x?\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\u001b[33massistant\u001b[0m (to ai_user):\n",
|
|
"\n",
|
|
"To find the value of x, we need to find the cube root of 125. \n",
|
|
"\n",
|
|
"The cube root of 125 is 5. \n",
|
|
"\n",
|
|
"Therefore, x = 5.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mai_user\u001b[0m (to assistant):\n",
|
|
"\n",
|
|
"Great job! Your answer is correct.\n",
|
|
"\n",
|
|
"Indeed, to find the value of x in the equation $x^3 = 125$, we need to find the cube root of 125. The cube root of 125 is indeed 5.\n",
|
|
"\n",
|
|
"Therefore, x = 5 is the correct solution. Well done!\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33massistant\u001b[0m (to ai_user):\n",
|
|
"\n",
|
|
"Thank you! I'm glad I could assist you. If you have any more questions, feel free to ask.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"\n",
|
|
"assistant = AssistantAgent(\n",
|
|
" \"assistant\",\n",
|
|
" system_message=\"You are a helpful assistant.\",\n",
|
|
" llm_config={\n",
|
|
" \"timeout\": 600,\n",
|
|
" \"cache_seed\": None,\n",
|
|
" \"config_list\": config_list,\n",
|
|
" },\n",
|
|
")\n",
|
|
"\n",
|
|
"ai_user_proxy = UserProxyAgent(\n",
|
|
" name=\"ai_user\",\n",
|
|
" human_input_mode=\"NEVER\",\n",
|
|
" max_consecutive_auto_reply=1,\n",
|
|
" code_execution_config=False,\n",
|
|
" llm_config={\n",
|
|
" \"config_list\": config_list,\n",
|
|
" },\n",
|
|
" # In the system message the \"user\" always refers to the other agent.\n",
|
|
" system_message=\"You ask a user for help. You check the answer from the user and provide feedback.\",\n",
|
|
")\n",
|
|
"assistant.reset()\n",
|
|
"\n",
|
|
"math_problem = \"$x^3=125$. What is x?\"\n",
|
|
"ai_user_proxy.initiate_chat(\n",
|
|
" assistant,\n",
|
|
" message=math_problem,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Agent 'ai_user':\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00025\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00025, prompt_tokens: 80, completion_tokens: 63, total_tokens: 143\n",
|
|
"\n",
|
|
"All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"\n",
|
|
"Agent 'assistant':\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00036\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00036, prompt_tokens: 162, completion_tokens: 60, total_tokens: 222\n",
|
|
"\n",
|
|
"All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"ai_user_proxy.print_usage_summary()\n",
|
|
"print()\n",
|
|
"assistant.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"No cost incurred from agent 'user'.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"user_proxy = UserProxyAgent(\n",
|
|
" name=\"user\",\n",
|
|
" human_input_mode=\"NEVER\",\n",
|
|
" max_consecutive_auto_reply=2,\n",
|
|
" code_execution_config=False,\n",
|
|
" default_auto_reply=\"That's all. Thank you.\",\n",
|
|
")\n",
|
|
"user_proxy.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 33,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Actual usage summary for assistant (excluding completion from cache): {'total_cost': 0.00036300000000000004, 'gpt-35-turbo': {'cost': 0.00036300000000000004, 'prompt_tokens': 162, 'completion_tokens': 60, 'total_tokens': 222}}\n",
|
|
"Total usage summary for assistant (including completion from cache): {'total_cost': 0.00036300000000000004, 'gpt-35-turbo': {'cost': 0.00036300000000000004, 'prompt_tokens': 162, 'completion_tokens': 60, 'total_tokens': 222}}\n",
|
|
"Actual usage summary for ai_user_proxy: {'total_cost': 0.000246, 'gpt-35-turbo': {'cost': 0.000246, 'prompt_tokens': 80, 'completion_tokens': 63, 'total_tokens': 143}}\n",
|
|
"Total usage summary for ai_user_proxy: {'total_cost': 0.000246, 'gpt-35-turbo': {'cost': 0.000246, 'prompt_tokens': 80, 'completion_tokens': 63, 'total_tokens': 143}}\n",
|
|
"Actual usage summary for user_proxy: None\n",
|
|
"Total usage summary for user_proxy: None\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(\"Actual usage summary for assistant (excluding completion from cache):\", assistant.get_actual_usage())\n",
|
|
"print(\"Total usage summary for assistant (including completion from cache):\", assistant.get_total_usage())\n",
|
|
"\n",
|
|
"print(\"Actual usage summary for ai_user_proxy:\", ai_user_proxy.get_actual_usage())\n",
|
|
"print(\"Total usage summary for ai_user_proxy:\", ai_user_proxy.get_total_usage())\n",
|
|
"\n",
|
|
"print(\"Actual usage summary for user_proxy:\", user_proxy.get_actual_usage())\n",
|
|
"print(\"Total usage summary for user_proxy:\", user_proxy.get_total_usage())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'total_cost': 0.0006090000000000001,\n",
|
|
" 'gpt-35-turbo': {'cost': 0.0006090000000000001,\n",
|
|
" 'prompt_tokens': 242,\n",
|
|
" 'completion_tokens': 123,\n",
|
|
" 'total_tokens': 365}}"
|
|
]
|
|
},
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"total_usage_summary, actual_usage_summary = gather_usage_summary([assistant, ai_user_proxy, user_proxy])\n",
|
|
"total_usage_summary"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "msft",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.18"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|