autogen/notebook/oai_client_cost.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/oai_client_cost.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved. \n",
    "\n",
    "Licensed under the MIT License.\n",
    "\n",
    "# Use AutoGen's OpenAIWrapper for cost estimation\n",
    "The `OpenAIWrapper` from `autogen` tracks token counts and costs of your API calls. Use the `create()` method to initiate requests and `print_usage_summary()` to retrieve a detailed usage report, including total cost and token usage for both cached and actual requests.\n",
    "\n",
    "- `mode=[\"actual\", \"total\"]` (default): print usage summary for non-caching completions and all completions (including cache).\n",
    "- `mode='actual'`: only print non-cached usage.\n",
    "- `mode='total'`: only print all usage (including cache).\n",
    "\n",
    "Reset your session's usage data with `clear_usage_summary()` when needed.\n",
    "\n",
    "## Requirements\n",
    "\n",
    "AutoGen requires `Python>=3.8`:\n",
    "```bash\n",
    "pip install \"pyautogen\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set your API Endpoint\n",
    "\n",
    "The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import autogen\n",
    "\n",
    "# config_list = autogen.config_list_from_json(\n",
    "#     \"OAI_CONFIG_LIST\",\n",
    "#     filter_dict={\n",
    "#         \"model\": [\"gpt-3.5-turbo\", \"gpt-4-1106-preview\"],\n",
    "#     },\n",
    "# )\n",
    "\n",
    "config_list = autogen.config_list_from_json(\n",
    "    \"OAI_CONFIG_LIST\",\n",
    "    filter_dict={\n",
    "        \"model\": [\"gpt-3.5-turbo\"],\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It first looks for environment variable \"OAI_CONFIG_LIST\" which needs to be a valid json string. If that variable is not found, it then looks for a json file named \"OAI_CONFIG_LIST\". It filters the configs by models (you can filter by other keys as well).\n",
    "\n",
    "The config list looks like the following:\n",
    "```python\n",
    "config_list = [\n",
    "    {\n",
    "        \"model\": \"gpt-4\",\n",
    "        \"api_key\": \"<your OpenAI API key>\",\n",
    "    },  # OpenAI API endpoint for gpt-4\n",
    "    {\n",
    "        \"model\": \"gpt-35-turbo-0631\",  # 0631 or newer is needed to use functions\n",
    "        \"base_url\": \"<your Azure OpenAI API base>\", \n",
    "        \"api_type\": \"azure\", \n",
    "        \"api_version\": \"2023-08-01-preview\", # 2023-07-01-preview or newer is needed to use functions\n",
    "        \"api_key\": \"<your Azure OpenAI API key>\"\n",
    "    }\n",
    "]\n",
    "```\n",
    "\n",
    "You can set the value of config_list in any way you prefer. Please refer to this [notebook](https://github.com/microsoft/autogen/blob/main/notebook/oai_openai_utils.ipynb) for full code examples of the different methods."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OpenAIWrapper with cost estimation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In update_usage_summary\n",
      "0.0001555\n"
     ]
    }
   ],
   "source": [
    "from autogen import OpenAIWrapper\n",
    "\n",
    "client = OpenAIWrapper(config_list=config_list)\n",
    "messages = [{'role': 'user', 'content': 'Can you give me 3 useful tips on learning Python? Keep it simple and short.'},]\n",
    "response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=None)\n",
    "print(response.cost)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage Summary\n",
    "\n",
    "When creating a instance of OpenAIWrapper, cost of all completions from the same instance is recorded. You can call `print_usage_summary()` to checkout your usage summary. To clear up, use `clear_usage_summary()`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No usage summary. Please call \"create\" first.\n"
     ]
    }
   ],
   "source": [
    "from autogen import OpenAIWrapper\n",
    "\n",
    "client = OpenAIWrapper(config_list=config_list)\n",
    "messages = [{'role': 'user', 'content': 'Can you give me 3 useful tips on learning Python? Keep it simple and short.'},]\n",
    "client.print_usage_summary() # print usage summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In update_usage_summary\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00026\n",
      "* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
      "\n",
      "All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00026\n",
      "* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
      "----------------------------------------------------------------------------------------------------\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary including cached usage: \n",
      "Total cost: 0.00026\n",
      "* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# The first creation\n",
    "# By default, cache_seed is set to 41 and enabled. If you don't want to use cache, set cache_seed to None.\n",
    "response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=41)\n",
    "client.print_usage_summary() # default to [\"actual\", \"total\"]\n",
    "client.print_usage_summary(mode='actual') # print actual usage summary\n",
    "client.print_usage_summary(mode='total') # print total usage summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'total_cost': 0.0002575, 'gpt-35-turbo': {'cost': 0.0002575, 'prompt_tokens': 25, 'completion_tokens': 110, 'total_tokens': 135}}\n",
      "{'total_cost': 0.0002575, 'gpt-35-turbo': {'cost': 0.0002575, 'prompt_tokens': 25, 'completion_tokens': 110, 'total_tokens': 135}}\n"
     ]
    }
   ],
   "source": [
    "# take out cost\n",
    "print(client.actual_usage_summary)\n",
    "print(client.total_usage_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In update_usage_summary\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Usage summary excluding cached usage: \n",
      "Total cost: 0.00026\n",
      "* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
      "\n",
      "Usage summary including cached usage: \n",
      "Total cost: 0.00052\n",
      "* Model 'gpt-35-turbo': cost: 0.00052, prompt_tokens: 50, completion_tokens: 220, total_tokens: 270\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# Since cache is enabled, the same completion will be returned from cache, which will not incur any actual cost. \n",
    "# So acutal cost doesn't change but total cost doubles.\n",
    "response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=41)\n",
    "client.print_usage_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No usage summary. Please call \"create\" first.\n"
     ]
    }
   ],
   "source": [
    "# clear usage summary\n",
    "client.clear_usage_summary() \n",
    "client.print_usage_summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In update_usage_summary\n",
      "----------------------------------------------------------------------------------------------------\n",
      "No actual cost incurred (all completions are using cache).\n",
      "\n",
      "Usage summary including cached usage: \n",
      "Total cost: 0.00026\n",
      "* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135\n",
      "----------------------------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "# all completions are returned from cache, so no actual cost incurred.\n",
    "response = client.create(messages=messages, model=\"gpt-35-turbo-1106\", cache_seed=41)\n",
    "client.print_usage_summary()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "msft",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}