autogen/notebook/agentchat_cost_token_tracking.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/oai_client_cost.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved. \n",
    "\n",
    "Licensed under the MIT License.\n",
    "\n",
    "# Usage tracking with AutoGen\n",
    "## 1. Use AutoGen's OpenAIWrapper for cost estimation\n",
    "The `OpenAIWrapper` from `autogen` tracks token counts and costs of your API calls. Use the `create()` method to initiate requests and `print_usage_summary()` to retrieve a detailed usage report, including total cost and token usage for both cached and actual requests.\n",
    "\n",
    "- `mode=[\"actual\", \"total\"]` (default): print usage summary for non-caching completions and all completions (including cache).\n",
    "- `mode='actual'`: only print non-cached usage.\n",
    "- `mode='total'`: only print all usage (including cache).\n",
    "\n",
    "Reset your session's usage data with `clear_usage_summary()` when needed.\n",
    "\n",
    "## 2. Track cost and token count for agents\n",
    "We also support cost estimation for agents. Use `Agent.print_usage_summary()` to print the cost summary for the agent.\n",
    "You can retrieve usage summary in a dict using `Agent.get_actual_usage()` and `Agent.get_total_usage()`. Note that `Agent.reset()` will also reset the usage summary.\n",
    "\n",
    "To gather usage data for a list of agents, we provide an utility function `autogen.gather_usage_summary(agents)` where you pass in a list of agents and gather the usage summary.\n",
    "\n",
    "## 3. Custom token price for up-to-date cost estimation\n",
    "AutoGen tries to keep the token prices up-to-date. However, you can pass in a `price` field in `config_list` if the token price is not listed or up-to-date. Please creating an issue or pull request to help us keep the token prices up-to-date!\n",
    "\n",
    "Note: in json files, the price should be a list of two floats.\n",
    "\n",
    "Example Usage:\n",
    "```python\n",
    "{\n",
    "    \"model\": \"gpt-3.5-turbo-xxxx\",\n",
    "    \"api_key\": \"YOUR_API_KEY\",\n",
    "    \"price\": [0.0005, 0.0015]\n",
    "}\n",
    "```\n",
    "\n",
    "## Caution when using Azure OpenAI!\n",
    "If you are using azure OpenAI, the model returned from completion doesn't have the version information. The returned model is either 'gpt-35-turbo' or 'gpt-4'. From there, we are calculating the cost based on gpt-3.5-turbo-0125: (0.0005, 0.0015) per 1k prompt and completion tokens and gpt-4-0613: (0.03, 0.06). This means the cost can be wrong if you are using a different version from azure OpenAI.\n",
    "\n",
    "This will be improved in the future. However, the token count summary is accurate. You can use the token count to calculate the cost yourself.\n",
    "\n",
    "## Requirements\n",
    "\n",
    "AutoGen requires `Python>=3.8`:\n",
    "```bash\n",
    "pip install \"pyautogen\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set your API Endpoint\n",
    "\n",
    "The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import autogen\n",
    "from autogen import AssistantAgent, OpenAIWrapper, UserProxyAgent, gather_usage_summary\n",
    "\n",
    "config_list = autogen.config_list_from_json(\n",
    "    \"OAI_CONFIG_LIST\",\n",
    "    filter_dict={\n",
    "        \"model\": [\"gpt-3.5-turbo\", \"gpt-3.5-turbo-16k\"],  # comment out to get all\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It first looks for environment variable \"OAI_CONFIG_LIST\" which needs to be a valid json string. If that variable is not found, it then looks for a json file named \"OAI_CONFIG_LIST\". It filters the configs by tags (you can filter by other keys as well).\n",
    "\n",
    "The config list looks like the following:\n",
    "```python\n",
    "config_list = [\n",
    "    {\n",
    "        \"model\": \"gpt-3.5-turbo\",\n",
    "        \"api_key\": \"<your OpenAI API key>\",\n",
    "        \"tags\": [\"gpt-3.5-turbo\"],\n",
    "    },  # OpenAI API endpoint for gpt-3.5-turbo\n",
    "    {\n",
    "        \"model\": \"gpt-35-turbo-0613\",  # 0613 or newer is needed to use functions\n",
    "        \"base_url\": \"<your Azure OpenAI API base>\", \n",
    "        \"api_type\": \"azure\", \n",
    "        \"api_version\": \"2024-02-01\", # 2023-07-01-preview or newer is needed to use functions\n",
    "        \"api_key\": \"<your Azure OpenAI API key>\",\n",
    "        \"tags\": [\"gpt-3.5-turbo\", \"0613\"],\n",
    "    }\n",
    "]\n",
    "```\n",
    "\n",
    "You can set the value of config_list in any way you prefer. Please refer to this [notebook](https://github.com/microsoft/autogen/blob/main/website/docs/topics/llm_configuration.ipynb) for full code examples of the different methods."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OpenAIWrapper with cost estimation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.00020600000000000002\n"
     ]
    }
   ],
   "source": [
    "client = OpenAIWrapper(config_list=config_list)\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
    "]\n",
    "response = client.create(messages=messages, cache_seed=None)\n",
    "print(response.cost)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OpenAIWrapper with custom token price"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Price: 109\n"
     ]
    }
   ],
   "source": [
    "# Adding price to the config_list\n",
    "for i in range(len(config_list)):\n",
    "    config_list[i][\"price\"] = [\n",
    "        1,\n",
    "        1,\n",
    "    ]  # Note: This price is just for demonstration purposes. Please replace it with the actual price of the model.\n",
    "\n",
    "client = OpenAIWrapper(config_list=config_list)\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
    "]\n",
    "response = client.create(messages=messages, cache_seed=None)\n",
    "print(\"Price:\", response.cost)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage Summary for OpenAIWrapper\n",
    "\n",
    "When creating a instance of OpenAIWrapper, cost of all completions from the same instance is recorded. You can call `print_usage_summary()` to checkout your usage summary. To clear up, use `clear_usage_summary()`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No usage summary. Please call \"create\" first.\n"
     ]
    }
   ],
   "source": [
    "client = OpenAIWrapper(config_list=config_list)\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
    "]\n",
    "client.print_usage_summary()  # print usage summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00023\n",
      "* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
      "\n",
      "All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00023\n",
      "* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
      "----------------------------------------------------------------------------------------------------\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary including cached usage: \n",
      "Total cost: 0.00023\n",
      "* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# The first creation\n",
    "# By default, cache_seed is set to 41 and enabled. If you don't want to use cache, set cache_seed to None.\n",
    "response = client.create(messages=messages, cache_seed=41)\n",
    "client.print_usage_summary()  # default to [\"actual\", \"total\"]\n",
    "client.print_usage_summary(mode=\"actual\")  # print actual usage summary\n",
    "client.print_usage_summary(mode=\"total\")  # print total usage summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'total_cost': 0.0002255, 'gpt-35-turbo': {'cost': 0.0002255, 'prompt_tokens': 25, 'completion_tokens': 142, 'total_tokens': 167}}\n",
      "{'total_cost': 0.0002255, 'gpt-35-turbo': {'cost': 0.0002255, 'prompt_tokens': 25, 'completion_tokens': 142, 'total_tokens': 167}}\n"
     ]
    }
   ],
   "source": [
    "# take out cost\n",
    "print(client.actual_usage_summary)\n",
    "print(client.total_usage_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00023\n",
      "* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
      "\n",
      "Usage summary including cached usage: \n",
      "Total cost: 0.00045\n",
      "* Model 'gpt-35-turbo': cost: 0.00045, prompt_tokens: 50, completion_tokens: 284, total_tokens: 334\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# Since cache is enabled, the same completion will be returned from cache, which will not incur any actual cost.\n",
    "# So actual cost doesn't change but total cost doubles.\n",
    "response = client.create(messages=messages, cache_seed=41)\n",
    "client.print_usage_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No usage summary. Please call \"create\" first.\n"
     ]
    }
   ],
   "source": [
    "# clear usage summary\n",
    "client.clear_usage_summary()\n",
    "client.print_usage_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------------------------------------------------------------------\n",
      "No actual cost incurred (all completions are using cache).\n",
      "\n",
      "Usage summary including cached usage: \n",
      "Total cost: 0.00023\n",
      "* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# all completions are returned from cache, so no actual cost incurred.\n",
    "response = client.create(messages=messages, cache_seed=41)\n",
    "client.print_usage_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage Summary for Agents\n",
    "\n",
    "- `Agent.print_usage_summary()` will print the cost summary for the agent.\n",
    "- `Agent.get_actual_usage()` and `Agent.get_total_usage()` will return the usage summary in a dict. When an agent doesn't use LLM, they will return None.\n",
    "- `Agent.reset()` will reset the usage summary.\n",
    "- `autogen.gather_usage_summary` will gather the usage summary for a list of agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mai_user\u001b[0m (to assistant):\n",
      "\n",
      "$x^3=125$. What is x?\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33massistant\u001b[0m (to ai_user):\n",
      "\n",
      "To find x, we need to take the cube root of 125. The cube root of a number is the number that, when multiplied by itself three times, gives the original number.\n",
      "\n",
      "In this case, the cube root of 125 is 5 since 5 * 5 * 5 = 125. Therefore, x = 5.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mai_user\u001b[0m (to assistant):\n",
      "\n",
      "That's correct! Well done. The value of x is indeed 5, as you correctly found by taking the cube root of 125. Keep up the good work!\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33massistant\u001b[0m (to ai_user):\n",
      "\n",
      "Thank you! I'm glad I could help. If you have any more questions, feel free to ask!\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "ChatResult(chat_id=None, chat_history=[{'content': '$x^3=125$. What is x?', 'role': 'assistant'}, {'content': 'To find x, we need to take the cube root of 125. The cube root of a number is the number that, when multiplied by itself three times, gives the original number.\\n\\nIn this case, the cube root of 125 is 5 since 5 * 5 * 5 = 125. Therefore, x = 5.', 'role': 'user'}, {'content': \"That's correct! Well done. The value of x is indeed 5, as you correctly found by taking the cube root of 125. Keep up the good work!\", 'role': 'assistant'}, {'content': \"Thank you! I'm glad I could help. If you have any more questions, feel free to ask!\", 'role': 'user'}], summary=\"Thank you! I'm glad I could help. If you have any more questions, feel free to ask!\", cost={'usage_including_cached_inference': {'total_cost': 0.000333, 'gpt-35-turbo': {'cost': 0.000333, 'prompt_tokens': 282, 'completion_tokens': 128, 'total_tokens': 410}}, 'usage_excluding_cached_inference': {'total_cost': 0.000333, 'gpt-35-turbo': {'cost': 0.000333, 'prompt_tokens': 282, 'completion_tokens': 128, 'total_tokens': 410}}}, human_input=[])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "assistant = AssistantAgent(\n",
    "    \"assistant\",\n",
    "    system_message=\"You are a helpful assistant.\",\n",
    "    llm_config={\n",
    "        \"timeout\": 600,\n",
    "        \"cache_seed\": None,\n",
    "        \"config_list\": config_list,\n",
    "    },\n",
    ")\n",
    "\n",
    "ai_user_proxy = UserProxyAgent(\n",
    "    name=\"ai_user\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    max_consecutive_auto_reply=1,\n",
    "    code_execution_config=False,\n",
    "    llm_config={\n",
    "        \"config_list\": config_list,\n",
    "    },\n",
    "    # In the system message the \"user\" always refers to the other agent.\n",
    "    system_message=\"You ask a user for help. You check the answer from the user and provide feedback.\",\n",
    ")\n",
    "assistant.reset()\n",
    "\n",
    "math_problem = \"$x^3=125$. What is x?\"\n",
    "ai_user_proxy.initiate_chat(\n",
    "    assistant,\n",
    "    message=math_problem,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent 'ai_user':\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00011\n",
      "* Model 'gpt-35-turbo': cost: 0.00011, prompt_tokens: 114, completion_tokens: 35, total_tokens: 149\n",
      "\n",
      "All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "\n",
      "Agent 'assistant':\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00022\n",
      "* Model 'gpt-35-turbo': cost: 0.00022, prompt_tokens: 168, completion_tokens: 93, total_tokens: 261\n",
      "\n",
      "All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "ai_user_proxy.print_usage_summary()\n",
    "print()\n",
    "assistant.print_usage_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No cost incurred from agent 'user'.\n"
     ]
    }
   ],
   "source": [
    "user_proxy = UserProxyAgent(\n",
    "    name=\"user\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    max_consecutive_auto_reply=2,\n",
    "    code_execution_config=False,\n",
    "    default_auto_reply=\"That's all. Thank you.\",\n",
    ")\n",
    "user_proxy.print_usage_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Actual usage summary for assistant (excluding completion from cache): {'total_cost': 0.0002235, 'gpt-35-turbo': {'cost': 0.0002235, 'prompt_tokens': 168, 'completion_tokens': 93, 'total_tokens': 261}}\n",
      "Total usage summary for assistant (including completion from cache): {'total_cost': 0.0002235, 'gpt-35-turbo': {'cost': 0.0002235, 'prompt_tokens': 168, 'completion_tokens': 93, 'total_tokens': 261}}\n",
      "Actual usage summary for ai_user_proxy: {'total_cost': 0.0001095, 'gpt-35-turbo': {'cost': 0.0001095, 'prompt_tokens': 114, 'completion_tokens': 35, 'total_tokens': 149}}\n",
      "Total usage summary for ai_user_proxy: {'total_cost': 0.0001095, 'gpt-35-turbo': {'cost': 0.0001095, 'prompt_tokens': 114, 'completion_tokens': 35, 'total_tokens': 149}}\n",
      "Actual usage summary for user_proxy: None\n",
      "Total usage summary for user_proxy: None\n"
     ]
    }
   ],
   "source": [
    "print(\"Actual usage summary for assistant (excluding completion from cache):\", assistant.get_actual_usage())\n",
    "print(\"Total usage summary for assistant (including completion from cache):\", assistant.get_total_usage())\n",
    "\n",
    "print(\"Actual usage summary for ai_user_proxy:\", ai_user_proxy.get_actual_usage())\n",
    "print(\"Total usage summary for ai_user_proxy:\", ai_user_proxy.get_total_usage())\n",
    "\n",
    "print(\"Actual usage summary for user_proxy:\", user_proxy.get_actual_usage())\n",
    "print(\"Total usage summary for user_proxy:\", user_proxy.get_total_usage())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'total_cost': 0.000333,\n",
       " 'gpt-35-turbo': {'cost': 0.000333,\n",
       "  'prompt_tokens': 282,\n",
       "  'completion_tokens': 128,\n",
       "  'total_tokens': 410}}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usage_summary = gather_usage_summary([assistant, ai_user_proxy, user_proxy])\n",
    "usage_summary[\"usage_including_cached_inference\"]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "msft",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}