autogen/notebook/autogen_openai_completion.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved. \n",
    "\n",
    "Licensed under the MIT License.\n",
    "\n",
    "# Use FLAML to Tune OpenAI Models\n",
    "\n",
    "FLAML offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Our study finds that tuning hyperparameters can significantly improve the utility of LLMs.\n",
    "\n",
    "In this notebook, we tune OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings. \n",
    "\n",
    "## Requirements\n",
    "\n",
    "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen,blendsearch] option:\n",
    "```bash\n",
    "pip install flaml[autogen,blendsearch]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:36.910966Z",
     "iopub.status.busy": "2023-02-24T23:25:36.910473Z",
     "iopub.status.idle": "2023-02-24T23:25:36.914554Z",
     "shell.execute_reply": "2023-02-24T23:25:36.914030Z"
    }
   },
   "outputs": [],
   "source": [
    "# %pip install flaml[autogen,blendsearch] datasets"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set your OpenAI key:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:36.917301Z",
     "iopub.status.busy": "2023-02-24T23:25:36.917011Z",
     "iopub.status.idle": "2023-02-24T23:25:36.923156Z",
     "shell.execute_reply": "2023-02-24T23:25:36.922619Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = \"<your OpenAI API key here>\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you use Azure OpenAI, uncomment the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:36.925804Z",
     "iopub.status.busy": "2023-02-24T23:25:36.925423Z",
     "iopub.status.idle": "2023-02-24T23:25:36.928191Z",
     "shell.execute_reply": "2023-02-24T23:25:36.927673Z"
    }
   },
   "outputs": [],
   "source": [
    "# import openai\n",
    "# openai.api_type = \"azure\"\n",
    "# openai.api_base = \"https://<your_endpoint>.openai.azure.com/\"\n",
    "# openai.api_version = \"2023-03-15-preview\"  # change if necessary"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load dataset\n",
    "\n",
    "First, we load the humaneval dataset. The dataset contains 164 examples. We use the first 20 for tuning the generation hyperparameters and the remaining for evaluation. In each example, the \"prompt\" is the prompt string for eliciting the code generation (renamed into \"definition\"), \"test\" is the Python code for unit test for the example, and \"entry_point\" is the function name to be tested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:36.931255Z",
     "iopub.status.busy": "2023-02-24T23:25:36.930838Z",
     "iopub.status.idle": "2023-02-24T23:25:39.148799Z",
     "shell.execute_reply": "2023-02-24T23:25:39.148113Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset openai_humaneval (/home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0be40d7ad7f049f1946bd69b0c570f33",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading cached shuffled indices for dataset at /home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75/cache-1e8448101c1b32e8.arrow\n"
     ]
    }
   ],
   "source": [
    "import datasets\n",
    "\n",
    "seed = 41\n",
    "data = datasets.load_dataset(\"openai_humaneval\")[\"test\"].shuffle(seed=seed)\n",
    "n_tune_data = 20\n",
    "tune_data = [\n",
    "    {\n",
    "        \"definition\": data[x][\"prompt\"],\n",
    "        \"test\": data[x][\"test\"],\n",
    "        \"entry_point\": data[x][\"entry_point\"],\n",
    "    }\n",
    "    for x in range(n_tune_data)\n",
    "]\n",
    "test_data = [\n",
    "    {\n",
    "        \"definition\": data[x][\"prompt\"],\n",
    "        \"test\": data[x][\"test\"],\n",
    "        \"entry_point\": data[x][\"entry_point\"],\n",
    "    }\n",
    "    for x in range(n_tune_data, len(data))\n",
    "]\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Check a tuning example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:39.152156Z",
     "iopub.status.busy": "2023-02-24T23:25:39.151531Z",
     "iopub.status.idle": "2023-02-24T23:25:39.155313Z",
     "shell.execute_reply": "2023-02-24T23:25:39.154731Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "def compare(game,guess):\n",
      "    \"\"\"I think we all remember that feeling when the result of some long-awaited\n",
      "    event is finally known. The feelings and thoughts you have at that moment are\n",
      "    definitely worth noting down and comparing.\n",
      "    Your task is to determine if a person correctly guessed the results of a number of matches.\n",
      "    You are given two arrays of scores and guesses of equal length, where each index shows a match. \n",
      "    Return an array of the same length denoting how far off each guess was. If they have guessed correctly,\n",
      "    the value is 0, and if not, the value is the absolute difference between the guess and the score.\n",
      "    \n",
      "    \n",
      "    example:\n",
      "\n",
      "    compare([1,2,3,4,5,1],[1,2,3,4,2,-2]) -> [0,0,0,0,3,3]\n",
      "    compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) -> [4,4,1,0,0,6]\n",
      "    \"\"\"\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(tune_data[1][\"definition\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is one example of the unit test code for verifying the correctness of the generated code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:39.158398Z",
     "iopub.status.busy": "2023-02-24T23:25:39.157766Z",
     "iopub.status.idle": "2023-02-24T23:25:39.161396Z",
     "shell.execute_reply": "2023-02-24T23:25:39.160797Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def check(candidate):\n",
      "\n",
      "    # Check some simple cases\n",
      "    assert candidate([1,2,3,4,5,1],[1,2,3,4,2,-2])==[0,0,0,0,3,3], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "    assert candidate([0,0,0,0,0,0],[0,0,0,0,0,0])==[0,0,0,0,0,0], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "    assert candidate([1,2,3],[-1,-2,-3])==[2,4,6], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "    assert candidate([1,2,3,5],[-1,2,3,4])==[2,0,0,1], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "\n",
      "    # Check some edge cases that are easy to work out by hand.\n",
      "    assert True, \"This prints if this assert fails 2 (also good for debugging!)\"\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(tune_data[1][\"test\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Success Metric\n",
    "\n",
    "Before we start tuning, we need to define the success metric we want to optimize. For each code generation task, we can use the model to generate multiple candidates, and then select one from them. If the final selected response can pass a unit test, we consider the task as successfully solved. Then we can define the mean success rate of a collection of tasks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:39.164187Z",
     "iopub.status.busy": "2023-02-24T23:25:39.163867Z",
     "iopub.status.idle": "2023-02-24T23:25:39.169009Z",
     "shell.execute_reply": "2023-02-24T23:25:39.168427Z"
    }
   },
   "outputs": [],
   "source": [
    "from functools import partial\n",
    "from flaml.autogen.code_utils import eval_function_completions, generate_assertions\n",
    "\n",
    "eval_with_generated_assertions = partial(\n",
    "    eval_function_completions,\n",
    "    assertions=generate_assertions,\n",
    "    use_docker=False,\n",
    "    # Please set use_docker=True if you have docker available to run the generated code.\n",
    "    # Using docker is safer than running the generated code directly.\n",
    ")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.\n",
    "\n",
    "## Use the tuning data to find a good configuration\n",
    "\n",
    "### Import the oai and tune subpackages from flaml.\n",
    "\n",
    "FLAML has provided an API for hyperparameter optimization of OpenAI models: `oai.Completion.tune` and to make a request with the tuned config: `oai.Completion.create`. First, we import oai from flaml:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:39.179030Z",
     "iopub.status.busy": "2023-02-24T23:25:39.178624Z",
     "iopub.status.idle": "2023-02-24T23:25:40.584410Z",
     "shell.execute_reply": "2023-02-24T23:25:40.583802Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "from flaml import oai"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For (local) reproducibility and cost efficiency, we cache responses from OpenAI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:40.587815Z",
     "iopub.status.busy": "2023-02-24T23:25:40.587283Z",
     "iopub.status.idle": "2023-02-24T23:25:40.590826Z",
     "shell.execute_reply": "2023-02-24T23:25:40.590158Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "oai.Completion.set_cache(seed)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will create a disk cache in \".cache/{seed}\". You can change `cache_path` in `set_cache()`. The cache for different seeds are stored separately.\n",
    "\n",
    "### Perform tuning\n",
    "\n",
    "The tuning will take a while to finish, depending on the optimization budget. The tuning will be performed under the specified optimization budgets.\n",
    "\n",
    "* `inference_budget` is the target average inference budget per instance in the benchmark. For example, 0.02 means the target inference budget is 0.02 dollars, which translates to 1000 tokens (input + output combined) if the text Davinci model is used.\n",
    "* `optimization_budget` is the total budget allowed to perform the tuning. For example, 5 means 5 dollars are allowed in total, which translates to 250K tokens for the text Davinci model.\n",
    "* `num_sumples` is the number of different hyperparameter configurations which is allowed to try. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided by `optimization_budget`.\n",
    "\n",
    "Users can specify tuning data, optimization metric, optimization mode, evaluation function, search spaces etc.. The default search space is:\n",
    "\n",
    "```python\n",
    "default_search_space = {\n",
    "    \"model\": tune.choice([\n",
    "        \"text-ada-001\",\n",
    "        \"text-babbage-001\",\n",
    "        \"text-davinci-003\",\n",
    "        \"gpt-3.5-turbo\",\n",
    "        \"gpt-4\",\n",
    "    ]),\n",
    "    \"temperature_or_top_p\": tune.choice(\n",
    "        [\n",
    "            {\"temperature\": tune.uniform(0, 1)},\n",
    "            {\"top_p\": tune.uniform(0, 1)},\n",
    "        ]\n",
    "    ),\n",
    "    \"max_tokens\": tune.lograndint(50, 1000),\n",
    "    \"n\": tune.randint(1, 100),\n",
    "    \"prompt\": \"{prompt}\",\n",
    "}\n",
    "```\n",
    "\n",
    "The default search space can be overridden by users' input.\n",
    "For example, the following code specifies three choices for the prompt and two choices of stop sequences. For hyperparameters which don't appear in users' input, the default search space will be used. If you don't have access to gpt-4 or would like to modify the choice of models, you can provide a different search space for model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:25:40.593603Z",
     "iopub.status.busy": "2023-02-24T23:25:40.593269Z",
     "iopub.status.idle": "2023-02-24T23:26:38.349191Z",
     "shell.execute_reply": "2023-02-24T23:26:38.348392Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m[I 2023-04-07 17:47:31,801]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n",
      "\u001b[32m[I 2023-04-07 17:47:31,804]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 04-07 17:47:31] {832} INFO - trial 1 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-ada-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
      "[flaml.tune.tune: 04-07 17:47:48] {215} INFO - result: {'index_selected': 26.0, 'succeed_assertions': 0.0, 'success': 0.0, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.010323600000000004, 'cost': 0.010323600000000004, 'inference_cost': 0.00022578, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-ada-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-ada-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 16.660529136657715}\n",
      "[flaml.tune.tune: 04-07 17:47:48] {832} INFO - trial 2 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-babbage-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
      "[flaml.tune.tune: 04-07 17:48:05] {215} INFO - result: {'index_selected': 26.0, 'succeed_assertions': 0.0, 'success': 0.0, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.03038410000000001, 'cost': 0.020060500000000002, 'inference_cost': 0.001003025, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-babbage-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-babbage-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 16.726527452468872}\n",
      "[flaml.tune.tune: 04-07 17:48:05] {832} INFO - trial 3 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
      "[flaml.tune.tune: 04-07 17:48:08] {215} INFO - result: {'index_selected': 3.95, 'succeed_assertions': 0.9, 'success': 0.75, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.8871640999999999, 'cost': 0.8567799999999999, 'inference_cost': 0.042096, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 3.7132015228271484}\n",
      "[flaml.tune.tune: 04-07 17:48:08] {832} INFO - trial 4 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-3.5-turbo', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
      "[flaml.tune.tune: 04-07 17:48:18] {215} INFO - result: {'index_selected': 13.85, 'succeed_assertions': 0.55, 'success': 0.5, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.9526220999999998, 'cost': 0.065458, 'inference_cost': 0.0033335, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-3.5-turbo', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'gpt-3.5-turbo', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 9.689077615737915}\n",
      "[flaml.tune.tune: 04-07 17:48:18] {832} INFO - trial 5 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-4', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
      "[flaml.tune.tune: 04-07 17:48:18] {215} INFO - result: {'success': 0, 'total_cost': 1.0297820999999998, 'cost': 0.07715999999999999, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-4', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'gpt-4', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 0.002007722854614258}\n",
      "[flaml.tune.tune: 04-07 17:48:18] {855} WARNING - fail to sample a trial for 100 times in a row, stopping.\n"
     ]
    }
   ],
   "source": [
    "config, analysis = oai.Completion.tune(\n",
    "    data=tune_data,  # the data for tuning\n",
    "    metric=\"success\",  # the metric to optimize\n",
    "    mode=\"max\",  # the optimization mode\n",
    "    eval_func=eval_with_generated_assertions,  # the evaluation function to return the success metrics\n",
    "    # log_file_name=\"logs/humaneval.log\",  # the log file name\n",
    "    inference_budget=0.05,  # the inference budget (dollar per instance)\n",
    "    optimization_budget=1,  # the optimization budget (dollar in total)\n",
    "    # num_samples can further limit the number of trials for different hyperparameter configurations;\n",
    "    # -1 means decided by the optimization budget only\n",
    "    num_samples=-1,\n",
    "    prompt=[\n",
    "        \"{definition}\",\n",
    "        \"# Python 3{definition}\",\n",
    "        \"Complete the following Python function:{definition}\",\n",
    "    ],  # the prompt templates to choose from\n",
    "    stop=[[\"\\nclass\", \"\\ndef\", \"\\nif\", \"\\nprint\"], None],  # the stop sequences\n",
    ")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Output tuning results\n",
    "\n",
    "After the tuning, we can print out the config and the result found by FLAML:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:26:38.352710Z",
     "iopub.status.busy": "2023-02-24T23:26:38.352378Z",
     "iopub.status.idle": "2023-02-24T23:26:38.356939Z",
     "shell.execute_reply": "2023-02-24T23:26:38.356217Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "optimized config {'prompt': '# Python 3{definition}', 'stop': ['\\nclass', '\\ndef', '\\nif', '\\nprint'], 'model': 'text-davinci-003', 'max_tokens': 148, 'n': 27, 'top_p': 0.755486898036596}\n",
      "best result on tuning data {'index_selected': 3.95, 'succeed_assertions': 0.9, 'success': 0.75, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.8871640999999999, 'cost': 0.8567799999999999, 'inference_cost': 0.042096, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 3.7132015228271484}\n"
     ]
    }
   ],
   "source": [
    "print(\"optimized config\", config)\n",
    "print(\"best result on tuning data\", analysis.best_result)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Make a request with the tuned config\n",
    "\n",
    "We can apply the tuned config on the request for an example task:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:26:38.359902Z",
     "iopub.status.busy": "2023-02-24T23:26:38.359506Z",
     "iopub.status.idle": "2023-02-24T23:26:39.343921Z",
     "shell.execute_reply": "2023-02-24T23:26:39.343051Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 0,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 1,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 2,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 3,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 4,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 5,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 6,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    results = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            results.append(0)\\n        else:\\n            results.append(abs(game[i] - guess[i]))\\n    return results\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 7,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    res = []\\n    for i in range(len(game)):\\n        res.append(abs(game[i] - guess[i]))\\n    return res\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 8,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 9,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 10,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 11,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 12,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 13,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    #your code here\\n    result = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 14,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 15,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 16,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 17,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 18,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # Your code here\\n    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 19,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 20,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    #create an empty list\\n    result = []\\n    #iterate over the two lists and compare the values\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 21,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 22,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # initialize the result array\\n    result = []\\n    \\n    # loop over the arrays and calculate the difference\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    \\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 23,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 24,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 25,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # Your code here\\n    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 26,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
      "    }\n",
      "  ],\n",
      "  \"created\": 1680456621,\n",
      "  \"id\": \"cmpl-70vozowIIN2Dcy5lOGYaIiYWvFFmh\",\n",
      "  \"model\": \"text-davinci-003\",\n",
      "  \"object\": \"text_completion\",\n",
      "  \"usage\": {\n",
      "    \"completion_tokens\": 1198,\n",
      "    \"prompt_tokens\": 243,\n",
      "    \"total_tokens\": 1441\n",
      "  }\n",
      "}\n",
      "{'index_selected': 0, 'succeed_assertions': 1, 'success': 1, 'gen_cost': 0.000702, 'assertions': 'assert compare([1,2,3,4,5,1],[1,2,3,4,2,-2]) == [0,0,0,0,3,3]\\nassert compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) == [4,4,1,0,0,6]'}\n"
     ]
    }
   ],
   "source": [
    "response = oai.Completion.create(context=tune_data[1], **config)\n",
    "print(response)\n",
    "print(eval_with_generated_assertions(oai.Completion.extract_text(response), **tune_data[1]))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate the success rate on the test data\n",
    "\n",
    "You can use flaml's `oai.Completion.test` to evaluate the performance of an entire dataset with the tuned config. The following code will take a while to evaluate all the 144 test data instances. The cost is about $6 if you uncomment it and run it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-24T23:26:39.347295Z",
     "iopub.status.busy": "2023-02-24T23:26:39.346994Z",
     "iopub.status.idle": "2023-02-24T23:29:27.160335Z",
     "shell.execute_reply": "2023-02-24T23:29:27.159519Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "performance on test data with the tuned config: {'index_selected': 5.208333333333333, 'succeed_assertions': 0.8402777777777778, 'success': 0.7777777777777778, 'gen_cost': 0.00045375000000000005, 'cost': 5.785519999999999, 'inference_cost': 0.04017722222222222}\n"
     ]
    }
   ],
   "source": [
    "# result = oai.Completion.test(test_data, **config)\n",
    "# print(\"performance on test data with the tuned config:\", result)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result will vary with the inference budget and optimization budget.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {
     "24dd93300e0442788ee6cc1310e5bf14": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "35cd066a31b242bb87b2c106ee72e5f2": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_8e7ee7687a99410d88a98a74ecfcea99",
        "IPY_MODEL_421e02a11a974b40b3ddb75382b3b640",
        "IPY_MODEL_77db9797e78b49438d21c5c8da34b4cb"
       ],
       "layout": "IPY_MODEL_47d3046236a54b0e8f9ae455a82c7e0b",
       "tabbable": null,
       "tooltip": null
      }
     },
     "3d5d106a38954af2bb3bde5777702f4e": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "3e1ebb31412443b0bca86a301cbdac11": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "421e02a11a974b40b3ddb75382b3b640": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_e6398d4027c9459a97965b9d91ae484f",
       "max": 1,
       "min": 0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_3e1ebb31412443b0bca86a301cbdac11",
       "tabbable": null,
       "tooltip": null,
       "value": 1
      }
     },
     "47d3046236a54b0e8f9ae455a82c7e0b": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "754800f7feb04acea977696e4787d1ff": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "77db9797e78b49438d21c5c8da34b4cb": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_7b6c4e1c11e249409a1edcd63be450d8",
       "placeholder": "",
       "style": "IPY_MODEL_3d5d106a38954af2bb3bde5777702f4e",
       "tabbable": null,
       "tooltip": null,
       "value": " 1/1 [00:00&lt;00:00, 44.40it/s]"
      }
     },
     "7b6c4e1c11e249409a1edcd63be450d8": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "8e7ee7687a99410d88a98a74ecfcea99": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_754800f7feb04acea977696e4787d1ff",
       "placeholder": "",
       "style": "IPY_MODEL_24dd93300e0442788ee6cc1310e5bf14",
       "tabbable": null,
       "tooltip": null,
       "value": "100%"
      }
     },
     "e6398d4027c9459a97965b9d91ae484f": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     }
    },
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								{
 								 "cells": [
-												create an automl option to remove unnecessary dependency for autogen and tune (#1007)

* version update post release v1.2.2

* automl option

* import pandas

* remove automl.utils

* default

* test

* type hint and version update

* dependency update

* link to open in colab

* use packging.version to close #725

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
											
										
										
											2023-05-24 16:55:04 -07:00
+								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												Agent notebook example with human feedback; Support shell command and multiple code blocks; Improve the system message for assistant agent; Improve utility functions for config lists; reuse docker image  (#1056)

* add agent notebook and documentation

* fix bug

* set flush to True when printing msg in agent

* add a math problem in agent notebook

* remove

* header

* improve notebook doc

* notebook update

* improve notebook example

* improve doc

* agent notebook example with user feedback

* log

* log

* improve notebook doc

* improve print

* doc

* human_input_mode

* human_input_mode str

* indent

* indent

* Update flaml/autogen/agent/user_proxy_agent.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* shell command and multiple code blocks

* Update notebook/autogen_agent.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update notebook/autogen_agent.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update notebook/autogen_agent.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* coding agent

* math notebook

* renaming and doc format

* typo

* infer lang

* sh

* docker

* docker

* reset consecutive autoreply counter

* fix explanation

* paper talk

* human feedback

* web info

* rename test

* config list explanation

* link to blogpost

* installation

* homepage features

* features

* features

* rename agent

* remove notebook

* notebook test

* docker command

* notebook update

* lang -> cmd

* notebook

* make it work for gpt-3.5

* return full log

* quote

* docker

* docker

* docker

* docker

* docker

* docker image list

* notebook

* notebook

* use_docker

* use_docker

* use_docker

* doc

* agent

* doc

* abs path

* pandas

* docker

* reuse docker image

* context window

* news

* print format

* pyspark version in py3.8

* pyspark in py3.8

* pyspark and ray

* quote

* pyspark

* pyspark

* pyspark

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
											
										
										
											2023-06-09 11:40:04 -07:00
+								    "<a href=\"https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
-												create an automl option to remove unnecessary dependency for autogen and tune (#1007)

* version update post release v1.2.2

* automl option

* import pandas

* remove automl.utils

* default

* test

* type hint and version update

* dependency update

* link to open in colab

* use packging.version to close #725

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
											
										
										
											2023-05-24 16:55:04 -07:00
+								   ]
 								  },
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "Copyright (c) Microsoft Corporation. All rights reserved. \n",
 								    "\n",
 								    "Licensed under the MIT License.\n",
 								    "\n",
 								    "# Use FLAML to Tune OpenAI Models\n",
 								    "\n",
 								    "FLAML offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Our study finds that tuning hyperparameters can significantly improve the utility of LLMs.\n",
 								    "\n",
 								    "In this notebook, we tune OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings. \n",
 								    "\n",
 								    "## Requirements\n",
 								    "\n",
-												extract code from text; solve_problem; request_timeout in config; improve code (#999)

* extract code from text

* solve_problem; request_timeout in config

* improve

* move import statement

* improve code

* generate assertions

* constant

* configs for implement; voting

* doc

* execute code in docker

* success indicator of code executation in docker

* success indicator

* execute code

* strip n

* add cost in generate_code

* add docstr

* filename

* bytes

* check docker version

* print log

* python test

* remove api key address

* rename exit code

* success exit code

* datasets

* exit code

* recover openai tests

* cache and pattern match

* wait

* wait

* cache and test

* timeout test

* python image name and skip macos

* windows image

* docker images

* volume path and yaml

* win path -> posix

* extensions

* path

* path

* path

* path

* path

* path

* path

* path

* path

* path

* path

* skip windows

* path

* timeout in windows

* use_docker

* use_docker

* hot fix from #1000

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
											
										
										
											2023-04-23 04:50:29 -07:00
+								    "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen,blendsearch] option:\n",
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								    "```bash\n",
-												create an automl option to remove unnecessary dependency for autogen and tune (#1007)

* version update post release v1.2.2

* automl option

* import pandas

* remove automl.utils

* default

* test

* type hint and version update

* dependency update

* link to open in colab

* use packging.version to close #725

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
											
										
										
											2023-05-24 16:55:04 -07:00
+								    "pip install flaml[autogen,blendsearch]\n",
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								    "```"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 1,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:36.910966Z",
 								     "iopub.status.busy": "2023-02-24T23:25:36.910473Z",
 								     "iopub.status.idle": "2023-02-24T23:25:36.914554Z",
 								     "shell.execute_reply": "2023-02-24T23:25:36.914030Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
-												create an automl option to remove unnecessary dependency for autogen and tune (#1007)

* version update post release v1.2.2

* automl option

* import pandas

* remove automl.utils

* default

* test

* type hint and version update

* dependency update

* link to open in colab

* use packging.version to close #725

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
											
										
										
											2023-05-24 16:55:04 -07:00
+								    "# %pip install flaml[autogen,blendsearch] datasets"
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "Set your OpenAI key:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 2,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:36.917301Z",
 								     "iopub.status.busy": "2023-02-24T23:25:36.917011Z",
 								     "iopub.status.idle": "2023-02-24T23:25:36.923156Z",
 								     "shell.execute_reply": "2023-02-24T23:25:36.922619Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "import os\n",
 								    "\n",
 								    "if \"OPENAI_API_KEY\" not in os.environ:\n",
 								    "    os.environ[\"OPENAI_API_KEY\"] = \"<your OpenAI API key here>\""
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "If you use Azure OpenAI, uncomment the following:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 3,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:36.925804Z",
 								     "iopub.status.busy": "2023-02-24T23:25:36.925423Z",
 								     "iopub.status.idle": "2023-02-24T23:25:36.928191Z",
 								     "shell.execute_reply": "2023-02-24T23:25:36.927673Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "# import openai\n",
 								    "# openai.api_type = \"azure\"\n",
 								    "# openai.api_base = \"https://<your_endpoint>.openai.azure.com/\"\n",
 								    "# openai.api_version = \"2023-03-15-preview\"  # change if necessary"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## Load dataset\n",
 								    "\n",
 								    "First, we load the humaneval dataset. The dataset contains 164 examples. We use the first 20 for tuning the generation hyperparameters and the remaining for evaluation. In each example, the \"prompt\" is the prompt string for eliciting the code generation (renamed into \"definition\"), \"test\" is the Python code for unit test for the example, and \"entry_point\" is the function name to be tested."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 4,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:36.931255Z",
 								     "iopub.status.busy": "2023-02-24T23:25:36.930838Z",
 								     "iopub.status.idle": "2023-02-24T23:25:39.148799Z",
 								     "shell.execute_reply": "2023-02-24T23:25:39.148113Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "Found cached dataset openai_humaneval (/home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75)\n"
 								     ]
 								    },
 								    {
 								     "data": {
 								      "application/vnd.jupyter.widget-view+json": {
 								       "model_id": "0be40d7ad7f049f1946bd69b0c570f33",
 								       "version_major": 2,
 								       "version_minor": 0
 								      },
 								      "text/plain": [
 								       "  0%|          | 0/1 [00:00<?, ?it/s]"
 								      ]
 								     },
 								     "metadata": {},
 								     "output_type": "display_data"
 								    },
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "Loading cached shuffled indices for dataset at /home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75/cache-1e8448101c1b32e8.arrow\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "import datasets\n",
 								    "\n",
 								    "seed = 41\n",
 								    "data = datasets.load_dataset(\"openai_humaneval\")[\"test\"].shuffle(seed=seed)\n",
 								    "n_tune_data = 20\n",
 								    "tune_data = [\n",
 								    "    {\n",
 								    "        \"definition\": data[x][\"prompt\"],\n",
 								    "        \"test\": data[x][\"test\"],\n",
 								    "        \"entry_point\": data[x][\"entry_point\"],\n",
 								    "    }\n",
 								    "    for x in range(n_tune_data)\n",
 								    "]\n",
 								    "test_data = [\n",
 								    "    {\n",
 								    "        \"definition\": data[x][\"prompt\"],\n",
 								    "        \"test\": data[x][\"test\"],\n",
 								    "        \"entry_point\": data[x][\"entry_point\"],\n",
 								    "    }\n",
 								    "    for x in range(n_tune_data, len(data))\n",
 								    "]\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "Check a tuning example:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 5,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:39.152156Z",
 								     "iopub.status.busy": "2023-02-24T23:25:39.151531Z",
 								     "iopub.status.idle": "2023-02-24T23:25:39.155313Z",
 								     "shell.execute_reply": "2023-02-24T23:25:39.154731Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "subslide"
 								    },
 								    "tags": []
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "\n",
 								      "def compare(game,guess):\n",
 								      "    \"\"\"I think we all remember that feeling when the result of some long-awaited\n",
 								      "    event is finally known. The feelings and thoughts you have at that moment are\n",
 								      "    definitely worth noting down and comparing.\n",
 								      "    Your task is to determine if a person correctly guessed the results of a number of matches.\n",
 								      "    You are given two arrays of scores and guesses of equal length, where each index shows a match. \n",
 								      "    Return an array of the same length denoting how far off each guess was. If they have guessed correctly,\n",
 								      "    the value is 0, and if not, the value is the absolute difference between the guess and the score.\n",
 								      "    \n",
 								      "    \n",
 								      "    example:\n",
 								      "\n",
 								      "    compare([1,2,3,4,5,1],[1,2,3,4,2,-2]) -> [0,0,0,0,3,3]\n",
 								      "    compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) -> [4,4,1,0,0,6]\n",
 								      "    \"\"\"\n",
 								      "\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "print(tune_data[1][\"definition\"])"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "Here is one example of the unit test code for verifying the correctness of the generated code:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 6,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:39.158398Z",
 								     "iopub.status.busy": "2023-02-24T23:25:39.157766Z",
 								     "iopub.status.idle": "2023-02-24T23:25:39.161396Z",
 								     "shell.execute_reply": "2023-02-24T23:25:39.160797Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "def check(candidate):\n",
 								      "\n",
 								      "    # Check some simple cases\n",
 								      "    assert candidate([1,2,3,4,5,1],[1,2,3,4,2,-2])==[0,0,0,0,3,3], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "    assert candidate([0,0,0,0,0,0],[0,0,0,0,0,0])==[0,0,0,0,0,0], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "    assert candidate([1,2,3],[-1,-2,-3])==[2,4,6], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "    assert candidate([1,2,3,5],[-1,2,3,4])==[2,0,0,1], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "\n",
 								      "    # Check some edge cases that are easy to work out by hand.\n",
 								      "    assert True, \"This prints if this assert fails 2 (also good for debugging!)\"\n",
 								      "\n",
 								      "\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "print(tune_data[1][\"test\"])"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## Define Success Metric\n",
 								    "\n",
 								    "Before we start tuning, we need to define the success metric we want to optimize. For each code generation task, we can use the model to generate multiple candidates, and then select one from them. If the final selected response can pass a unit test, we consider the task as successfully solved. Then we can define the mean success rate of a collection of tasks."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 7,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:39.164187Z",
 								     "iopub.status.busy": "2023-02-24T23:25:39.163867Z",
 								     "iopub.status.idle": "2023-02-24T23:25:39.169009Z",
 								     "shell.execute_reply": "2023-02-24T23:25:39.168427Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "from functools import partial\n",
 								    "from flaml.autogen.code_utils import eval_function_completions, generate_assertions\n",
 								    "\n",
-												extract code from text; solve_problem; request_timeout in config; improve code (#999)

* extract code from text

* solve_problem; request_timeout in config

* improve

* move import statement

* improve code

* generate assertions

* constant

* configs for implement; voting

* doc

* execute code in docker

* success indicator of code executation in docker

* success indicator

* execute code

* strip n

* add cost in generate_code

* add docstr

* filename

* bytes

* check docker version

* print log

* python test

* remove api key address

* rename exit code

* success exit code

* datasets

* exit code

* recover openai tests

* cache and pattern match

* wait

* wait

* cache and test

* timeout test

* python image name and skip macos

* windows image

* docker images

* volume path and yaml

* win path -> posix

* extensions

* path

* path

* path

* path

* path

* path

* path

* path

* path

* path

* path

* skip windows

* path

* timeout in windows

* use_docker

* use_docker

* hot fix from #1000

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
											
										
										
											2023-04-23 04:50:29 -07:00
+								    "eval_with_generated_assertions = partial(\n",
 								    "    eval_function_completions,\n",
 								    "    assertions=generate_assertions,\n",
 								    "    use_docker=False,\n",
 								    "    # Please set use_docker=True if you have docker available to run the generated code.\n",
 								    "    # Using docker is safer than running the generated code directly.\n",
 								    ")\n"
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.\n",
 								    "\n",
 								    "## Use the tuning data to find a good configuration\n",
 								    "\n",
 								    "### Import the oai and tune subpackages from flaml.\n",
 								    "\n",
 								    "FLAML has provided an API for hyperparameter optimization of OpenAI models: `oai.Completion.tune` and to make a request with the tuned config: `oai.Completion.create`. First, we import oai from flaml:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 8,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:39.179030Z",
 								     "iopub.status.busy": "2023-02-24T23:25:39.178624Z",
 								     "iopub.status.idle": "2023-02-24T23:25:40.584410Z",
 								     "shell.execute_reply": "2023-02-24T23:25:40.583802Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "from flaml import oai"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "For (local) reproducibility and cost efficiency, we cache responses from OpenAI."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 9,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:40.587815Z",
 								     "iopub.status.busy": "2023-02-24T23:25:40.587283Z",
 								     "iopub.status.idle": "2023-02-24T23:25:40.590826Z",
 								     "shell.execute_reply": "2023-02-24T23:25:40.590158Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "oai.Completion.set_cache(seed)"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "This will create a disk cache in \".cache/{seed}\". You can change `cache_path` in `set_cache()`. The cache for different seeds are stored separately.\n",
 								    "\n",
 								    "### Perform tuning\n",
 								    "\n",
 								    "The tuning will take a while to finish, depending on the optimization budget. The tuning will be performed under the specified optimization budgets.\n",
 								    "\n",
 								    "* `inference_budget` is the target average inference budget per instance in the benchmark. For example, 0.02 means the target inference budget is 0.02 dollars, which translates to 1000 tokens (input + output combined) if the text Davinci model is used.\n",
 								    "* `optimization_budget` is the total budget allowed to perform the tuning. For example, 5 means 5 dollars are allowed in total, which translates to 250K tokens for the text Davinci model.\n",
 								    "* `num_sumples` is the number of different hyperparameter configurations which is allowed to try. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided by `optimization_budget`.\n",
 								    "\n",
 								    "Users can specify tuning data, optimization metric, optimization mode, evaluation function, search spaces etc.. The default search space is:\n",
 								    "\n",
 								    "```python\n",
 								    "default_search_space = {\n",
 								    "    \"model\": tune.choice([\n",
 								    "        \"text-ada-001\",\n",
 								    "        \"text-babbage-001\",\n",
 								    "        \"text-davinci-003\",\n",
 								    "        \"gpt-3.5-turbo\",\n",
 								    "        \"gpt-4\",\n",
 								    "    ]),\n",
 								    "    \"temperature_or_top_p\": tune.choice(\n",
 								    "        [\n",
 								    "            {\"temperature\": tune.uniform(0, 1)},\n",
 								    "            {\"top_p\": tune.uniform(0, 1)},\n",
 								    "        ]\n",
 								    "    ),\n",
 								    "    \"max_tokens\": tune.lograndint(50, 1000),\n",
 								    "    \"n\": tune.randint(1, 100),\n",
 								    "    \"prompt\": \"{prompt}\",\n",
 								    "}\n",
 								    "```\n",
 								    "\n",
 								    "The default search space can be overridden by users' input.\n",
 								    "For example, the following code specifies three choices for the prompt and two choices of stop sequences. For hyperparameters which don't appear in users' input, the default search space will be used. If you don't have access to gpt-4 or would like to modify the choice of models, you can provide a different search space for model."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 14,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:25:40.593603Z",
 								     "iopub.status.busy": "2023-02-24T23:25:40.593269Z",
 								     "iopub.status.idle": "2023-02-24T23:26:38.349191Z",
 								     "shell.execute_reply": "2023-02-24T23:26:38.348392Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "\u001b[32m[I 2023-04-07 17:47:31,801]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n",
 								      "\u001b[32m[I 2023-04-07 17:47:31,804]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 04-07 17:47:31] {832} INFO - trial 1 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-ada-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
 								      "[flaml.tune.tune: 04-07 17:47:48] {215} INFO - result: {'index_selected': 26.0, 'succeed_assertions': 0.0, 'success': 0.0, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.010323600000000004, 'cost': 0.010323600000000004, 'inference_cost': 0.00022578, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-ada-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-ada-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 16.660529136657715}\n",
 								      "[flaml.tune.tune: 04-07 17:47:48] {832} INFO - trial 2 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-babbage-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
 								      "[flaml.tune.tune: 04-07 17:48:05] {215} INFO - result: {'index_selected': 26.0, 'succeed_assertions': 0.0, 'success': 0.0, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.03038410000000001, 'cost': 0.020060500000000002, 'inference_cost': 0.001003025, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-babbage-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-babbage-001', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 16.726527452468872}\n",
 								      "[flaml.tune.tune: 04-07 17:48:05] {832} INFO - trial 3 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
 								      "[flaml.tune.tune: 04-07 17:48:08] {215} INFO - result: {'index_selected': 3.95, 'succeed_assertions': 0.9, 'success': 0.75, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.8871640999999999, 'cost': 0.8567799999999999, 'inference_cost': 0.042096, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 3.7132015228271484}\n",
 								      "[flaml.tune.tune: 04-07 17:48:08] {832} INFO - trial 4 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-3.5-turbo', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
 								      "[flaml.tune.tune: 04-07 17:48:18] {215} INFO - result: {'index_selected': 13.85, 'succeed_assertions': 0.55, 'success': 0.5, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.9526220999999998, 'cost': 0.065458, 'inference_cost': 0.0033335, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-3.5-turbo', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'gpt-3.5-turbo', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 9.689077615737915}\n",
 								      "[flaml.tune.tune: 04-07 17:48:18] {832} INFO - trial 5 config: {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-4', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}\n",
 								      "[flaml.tune.tune: 04-07 17:48:18] {215} INFO - result: {'success': 0, 'total_cost': 1.0297820999999998, 'cost': 0.07715999999999999, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'gpt-4', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'gpt-4', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 0.002007722854614258}\n",
 								      "[flaml.tune.tune: 04-07 17:48:18] {855} WARNING - fail to sample a trial for 100 times in a row, stopping.\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "config, analysis = oai.Completion.tune(\n",
 								    "    data=tune_data,  # the data for tuning\n",
 								    "    metric=\"success\",  # the metric to optimize\n",
 								    "    mode=\"max\",  # the optimization mode\n",
 								    "    eval_func=eval_with_generated_assertions,  # the evaluation function to return the success metrics\n",
 								    "    # log_file_name=\"logs/humaneval.log\",  # the log file name\n",
 								    "    inference_budget=0.05,  # the inference budget (dollar per instance)\n",
 								    "    optimization_budget=1,  # the optimization budget (dollar in total)\n",
 								    "    # num_samples can further limit the number of trials for different hyperparameter configurations;\n",
 								    "    # -1 means decided by the optimization budget only\n",
 								    "    num_samples=-1,\n",
 								    "    prompt=[\n",
 								    "        \"{definition}\",\n",
 								    "        \"# Python 3{definition}\",\n",
 								    "        \"Complete the following Python function:{definition}\",\n",
 								    "    ],  # the prompt templates to choose from\n",
 								    "    stop=[[\"\\nclass\", \"\\ndef\", \"\\nif\", \"\\nprint\"], None],  # the stop sequences\n",
 								    ")\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "### Output tuning results\n",
 								    "\n",
 								    "After the tuning, we can print out the config and the result found by FLAML:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 15,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:26:38.352710Z",
 								     "iopub.status.busy": "2023-02-24T23:26:38.352378Z",
 								     "iopub.status.idle": "2023-02-24T23:26:38.356939Z",
 								     "shell.execute_reply": "2023-02-24T23:26:38.356217Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "optimized config {'prompt': '# Python 3{definition}', 'stop': ['\\nclass', '\\ndef', '\\nif', '\\nprint'], 'model': 'text-davinci-003', 'max_tokens': 148, 'n': 27, 'top_p': 0.755486898036596}\n",
 								      "best result on tuning data {'index_selected': 3.95, 'succeed_assertions': 0.9, 'success': 0.75, 'gen_cost': 0.00046369999999999994, 'assertions': 'assert vowels_count(\"abcde\") == 2\\nassert vowels_count(\"ACEDY\") == 3', 'total_cost': 0.8871640999999999, 'cost': 0.8567799999999999, 'inference_cost': 0.042096, 'training_iteration': 0, 'config': {'prompt': 1, 'stop': 0, 'subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}}, 'config/prompt': 1, 'config/stop': 0, 'config/subspace': {'model': 'text-davinci-003', 'max_tokens': 148, 'temperature_or_top_p': {'top_p': 0.755486898036596}, 'n': 27}, 'experiment_tag': 'exp', 'time_total_s': 3.7132015228271484}\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "print(\"optimized config\", config)\n",
 								    "print(\"best result on tuning data\", analysis.best_result)"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "### Make a request with the tuned config\n",
 								    "\n",
 								    "We can apply the tuned config on the request for an example task:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 16,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:26:38.359902Z",
 								     "iopub.status.busy": "2023-02-24T23:26:38.359506Z",
 								     "iopub.status.idle": "2023-02-24T23:26:39.343921Z",
 								     "shell.execute_reply": "2023-02-24T23:26:39.343051Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "subslide"
 								    },
 								    "tags": []
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "{\n",
 								      "  \"choices\": [\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 0,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 1,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 2,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 3,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 4,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 5,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 6,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    results = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            results.append(0)\\n        else:\\n            results.append(abs(game[i] - guess[i]))\\n    return results\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 7,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    res = []\\n    for i in range(len(game)):\\n        res.append(abs(game[i] - guess[i]))\\n    return res\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 8,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 9,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 10,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 11,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 12,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 13,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    #your code here\\n    result = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 14,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 15,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 16,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 17,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 18,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # Your code here\\n    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 19,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 20,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    #create an empty list\\n    result = []\\n    #iterate over the two lists and compare the values\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 21,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i] - guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 22,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # initialize the result array\\n    result = []\\n    \\n    # loop over the arrays and calculate the difference\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    \\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 23,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 24,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 25,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # Your code here\\n    result = []\\n    for i in range(len(game)):\\n        diff = abs(game[i] - guess[i])\\n        result.append(diff)\\n    return result\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 26,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    result = []\\n    for i in range(len(game)):\\n        result.append(abs(game[i]-guess[i]))\\n    return result\"\n",
 								      "    }\n",
 								      "  ],\n",
 								      "  \"created\": 1680456621,\n",
 								      "  \"id\": \"cmpl-70vozowIIN2Dcy5lOGYaIiYWvFFmh\",\n",
 								      "  \"model\": \"text-davinci-003\",\n",
 								      "  \"object\": \"text_completion\",\n",
 								      "  \"usage\": {\n",
 								      "    \"completion_tokens\": 1198,\n",
 								      "    \"prompt_tokens\": 243,\n",
 								      "    \"total_tokens\": 1441\n",
 								      "  }\n",
 								      "}\n",
 								      "{'index_selected': 0, 'succeed_assertions': 1, 'success': 1, 'gen_cost': 0.000702, 'assertions': 'assert compare([1,2,3,4,5,1],[1,2,3,4,2,-2]) == [0,0,0,0,3,3]\\nassert compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) == [4,4,1,0,0,6]'}\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "response = oai.Completion.create(context=tune_data[1], **config)\n",
 								    "print(response)\n",
 								    "print(eval_with_generated_assertions(oai.Completion.extract_text(response), **tune_data[1]))\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "### Evaluate the success rate on the test data\n",
 								    "\n",
 								    "You can use flaml's `oai.Completion.test` to evaluate the performance of an entire dataset with the tuned config. The following code will take a while to evaluate all the 144 test data instances. The cost is about $6 if you uncomment it and run it."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 18,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-24T23:26:39.347295Z",
 								     "iopub.status.busy": "2023-02-24T23:26:39.346994Z",
 								     "iopub.status.idle": "2023-02-24T23:29:27.160335Z",
 								     "shell.execute_reply": "2023-02-24T23:29:27.159519Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "performance on test data with the tuned config: {'index_selected': 5.208333333333333, 'succeed_assertions': 0.8402777777777778, 'success': 0.7777777777777778, 'gen_cost': 0.00045375000000000005, 'cost': 5.785519999999999, 'inference_cost': 0.04017722222222222}\n"
 								     ]
 								    }
 								   ],
 								   "source": [
-												response filter (#1039)

* response filter

* rewrite implement based on the filter

* multi responses

* abs path

* code handling

* option to not use docker

* context

* eval_only -> raise_error

* notebook

* utils

* utils

* separate tests

* test

* test

* test

* test

* test

* test

* test

* test

* **config in test()

* test

* test

* filename
											
										
										
											2023-05-21 15:22:29 -07:00
+								    "# result = oai.Completion.test(test_data, **config)\n",
-												autogen subpackage (#968)

* math utils in autogen

* cleanup

* code utils

* remove check function from code response

* comment out test

* GPT-4

* increase request timeout

* name

* logging and error handling

* better doc

* doc

* codegen optimized

* GPT series

* text

* no demo example

* math

* import openai

* import openai

* azure model name

* azure model name

* openai version

* generate assertion if necessary

* condition to generate assertions

* init region key

* rename

* comments about budget

* prompt

---------

Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com>
											
										
										
											2023-04-07 20:04:01 -07:00
+								    "# print(\"performance on test data with the tuned config:\", result)"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "The result will vary with the inference budget and optimization budget.\n"
 								   ]
 								  }
 								 ],
 								 "metadata": {
 								  "kernelspec": {
 								   "display_name": "Python 3",
 								   "language": "python",
 								   "name": "python3"
 								  },
 								  "language_info": {
 								   "codemirror_mode": {
 								    "name": "ipython",
 								    "version": 3
 								   },
 								   "file_extension": ".py",
 								   "mimetype": "text/x-python",
 								   "name": "python",
 								   "nbconvert_exporter": "python",
 								   "pygments_lexer": "ipython3",
 								   "version": "3.9.16"
 								  },
 								  "vscode": {
 								   "interpreter": {
 								    "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
 								   }
 								  },
 								  "widgets": {
 								   "application/vnd.jupyter.widget-state+json": {
 								    "state": {
 								     "24dd93300e0442788ee6cc1310e5bf14": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLStyleModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLStyleModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "StyleView",
 								       "background": null,
 								       "description_width": "",
 								       "font_size": null,
 								       "text_color": null
 								      }
 								     },
 								     "35cd066a31b242bb87b2c106ee72e5f2": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HBoxModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HBoxModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "HBoxView",
 								       "box_style": "",
 								       "children": [
 								        "IPY_MODEL_8e7ee7687a99410d88a98a74ecfcea99",
 								        "IPY_MODEL_421e02a11a974b40b3ddb75382b3b640",
 								        "IPY_MODEL_77db9797e78b49438d21c5c8da34b4cb"
 								       ],
 								       "layout": "IPY_MODEL_47d3046236a54b0e8f9ae455a82c7e0b",
 								       "tabbable": null,
 								       "tooltip": null
 								      }
 								     },
 								     "3d5d106a38954af2bb3bde5777702f4e": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLStyleModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLStyleModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "StyleView",
 								       "background": null,
 								       "description_width": "",
 								       "font_size": null,
 								       "text_color": null
 								      }
 								     },
 								     "3e1ebb31412443b0bca86a301cbdac11": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "ProgressStyleModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "ProgressStyleModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "StyleView",
 								       "bar_color": null,
 								       "description_width": ""
 								      }
 								     },
 								     "421e02a11a974b40b3ddb75382b3b640": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "FloatProgressModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "FloatProgressModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "ProgressView",
 								       "bar_style": "success",
 								       "description": "",
 								       "description_allow_html": false,
 								       "layout": "IPY_MODEL_e6398d4027c9459a97965b9d91ae484f",
 								       "max": 1,
 								       "min": 0,
 								       "orientation": "horizontal",
 								       "style": "IPY_MODEL_3e1ebb31412443b0bca86a301cbdac11",
 								       "tabbable": null,
 								       "tooltip": null,
 								       "value": 1
 								      }
 								     },
 								     "47d3046236a54b0e8f9ae455a82c7e0b": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "754800f7feb04acea977696e4787d1ff": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "77db9797e78b49438d21c5c8da34b4cb": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "HTMLView",
 								       "description": "",
 								       "description_allow_html": false,
 								       "layout": "IPY_MODEL_7b6c4e1c11e249409a1edcd63be450d8",
 								       "placeholder": "",
 								       "style": "IPY_MODEL_3d5d106a38954af2bb3bde5777702f4e",
 								       "tabbable": null,
 								       "tooltip": null,
 								       "value": " 1/1 [00:00&lt;00:00, 44.40it/s]"
 								      }
 								     },
 								     "7b6c4e1c11e249409a1edcd63be450d8": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "8e7ee7687a99410d88a98a74ecfcea99": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "HTMLView",
 								       "description": "",
 								       "description_allow_html": false,
 								       "layout": "IPY_MODEL_754800f7feb04acea977696e4787d1ff",
 								       "placeholder": "",
 								       "style": "IPY_MODEL_24dd93300e0442788ee6cc1310e5bf14",
 								       "tabbable": null,
 								       "tooltip": null,
 								       "value": "100%"
 								      }
 								     },
 								     "e6398d4027c9459a97965b9d91ae484f": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     }
 								    },
 								    "version_major": 2,
 								    "version_minor": 0
 								   }
 								  }
 								 },
 								 "nbformat": 4,
 								 "nbformat_minor": 2
 								}