autogen/notebook/integrate_openai.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved. \n",
    "\n",
    "Licensed under the MIT License.\n",
    "\n",
    "# Use FLAML to Tune OpenAI Models\n",
    "\n",
    "In this notebook, we tune OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings. \n",
    "\n",
    "## Requirements\n",
    "\n",
    "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
    "```bash\n",
    "pip install flaml[openai]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:46.718963Z",
     "iopub.status.busy": "2023-02-05T17:10:46.718348Z",
     "iopub.status.idle": "2023-02-05T17:10:46.722958Z",
     "shell.execute_reply": "2023-02-05T17:10:46.721858Z"
    }
   },
   "outputs": [],
   "source": [
    "# %pip install flaml[openai] datasets"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set your OpenAI key:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:46.725945Z",
     "iopub.status.busy": "2023-02-05T17:10:46.725628Z",
     "iopub.status.idle": "2023-02-05T17:10:46.732477Z",
     "shell.execute_reply": "2023-02-05T17:10:46.731947Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = \"<your OpenAI API key here>\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load dataset\n",
    "\n",
    "First, we load the humaneval dataset. The dataset contains 164 examples. We use the first 20 for tuning the generation hyperparameters and the remaining for evaluation. In each example, the \"prompt\" is the prompt string for eliciting the code generation, \"test\" is the Python code for unit test for the example, and \"entry_point\" is the function name to be tested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:46.735236Z",
     "iopub.status.busy": "2023-02-05T17:10:46.734852Z",
     "iopub.status.idle": "2023-02-05T17:10:49.037146Z",
     "shell.execute_reply": "2023-02-05T17:10:49.036349Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset openai_humaneval (/home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f0cac133ee7d4d9b91352e1002d0ece0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading cached shuffled indices for dataset at /home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75/cache-1e8448101c1b32e8.arrow\n"
     ]
    }
   ],
   "source": [
    "import datasets\n",
    "\n",
    "seed = 41\n",
    "data = datasets.load_dataset(\"openai_humaneval\")[\"test\"].shuffle(seed=seed)\n",
    "n_tune_data = 20\n",
    "tune_data = [\n",
    "    {\n",
    "        \"prompt\": data[x][\"prompt\"],\n",
    "        \"test\": data[x][\"test\"],\n",
    "        \"entry_point\": data[x][\"entry_point\"],\n",
    "    }\n",
    "    for x in range(n_tune_data)\n",
    "]\n",
    "test_data = [\n",
    "    {\n",
    "        \"prompt\": data[x][\"prompt\"],\n",
    "        \"test\": data[x][\"test\"],\n",
    "        \"entry_point\": data[x][\"entry_point\"],\n",
    "    }\n",
    "    for x in range(n_tune_data, len(data))\n",
    "]\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Check a tuning example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:49.042363Z",
     "iopub.status.busy": "2023-02-05T17:10:49.041641Z",
     "iopub.status.idle": "2023-02-05T17:10:49.050482Z",
     "shell.execute_reply": "2023-02-05T17:10:49.049608Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "def compare(game,guess):\n",
      "    \"\"\"I think we all remember that feeling when the result of some long-awaited\n",
      "    event is finally known. The feelings and thoughts you have at that moment are\n",
      "    definitely worth noting down and comparing.\n",
      "    Your task is to determine if a person correctly guessed the results of a number of matches.\n",
      "    You are given two arrays of scores and guesses of equal length, where each index shows a match. \n",
      "    Return an array of the same length denoting how far off each guess was. If they have guessed correctly,\n",
      "    the value is 0, and if not, the value is the absolute difference between the guess and the score.\n",
      "    \n",
      "    \n",
      "    example:\n",
      "\n",
      "    compare([1,2,3,4,5,1],[1,2,3,4,2,-2]) -> [0,0,0,0,3,3]\n",
      "    compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) -> [4,4,1,0,0,6]\n",
      "    \"\"\"\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(tune_data[1][\"prompt\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is one example of the unit test code for verifying the correctness of the generated code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:49.054027Z",
     "iopub.status.busy": "2023-02-05T17:10:49.053625Z",
     "iopub.status.idle": "2023-02-05T17:10:49.058153Z",
     "shell.execute_reply": "2023-02-05T17:10:49.057421Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def check(candidate):\n",
      "\n",
      "    # Check some simple cases\n",
      "    assert candidate([1,2,3,4,5,1],[1,2,3,4,2,-2])==[0,0,0,0,3,3], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "    assert candidate([0,0,0,0,0,0],[0,0,0,0,0,0])==[0,0,0,0,0,0], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "    assert candidate([1,2,3],[-1,-2,-3])==[2,4,6], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "    assert candidate([1,2,3,5],[-1,2,3,4])==[2,0,0,1], \"This prints if this assert fails 1 (good for debugging!)\"\n",
      "\n",
      "    # Check some edge cases that are easy to work out by hand.\n",
      "    assert True, \"This prints if this assert fails 2 (also good for debugging!)\"\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(tune_data[1][\"test\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Success Metric\n",
    "\n",
    "Before we start tuning, we need to define the success metric we want to opotimize. For each code generation task, if one of the returned responses can pass the test, we consider the task as successfully solved. Then we can define the mean success rate of a collection of tasks.\n",
    "\n",
    "### Define a code executor\n",
    "\n",
    "First, we write a simple code executor. The code executor takes the generated code and the test code as the input, and execute them with a timer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:49.061566Z",
     "iopub.status.busy": "2023-02-05T17:10:49.061259Z",
     "iopub.status.idle": "2023-02-05T17:10:49.066812Z",
     "shell.execute_reply": "2023-02-05T17:10:49.066136Z"
    }
   },
   "outputs": [],
   "source": [
    "import signal\n",
    "import subprocess\n",
    "import sys\n",
    "\n",
    "def timeout_handler(signum, frame):\n",
    "    raise TimeoutError(\"Timed out!\")\n",
    "\n",
    "signal.signal(signal.SIGALRM, timeout_handler)\n",
    "max_exec_time = 3  # seconds\n",
    "\n",
    "def execute_code(code):\n",
    "    code = code.strip()\n",
    "    with open(\"codetest.py\", \"w\") as fout:\n",
    "        fout.write(code)\n",
    "    try:\n",
    "        signal.alarm(max_exec_time)\n",
    "        result = subprocess.run(\n",
    "            [sys.executable, \"codetest.py\"],\n",
    "            stdout=subprocess.DEVNULL,\n",
    "            stderr=subprocess.PIPE,\n",
    "        )\n",
    "        signal.alarm(0)\n",
    "    except TimeoutError:\n",
    "        return 0\n",
    "    return int(result.returncode == 0)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function will create a temp file \"codetest.py\" and execute it in a separate process. It allows for 3 seconds to finish that code.\n",
    "\n",
    "### Define a function to evaluate the success for a given program synthesis task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:49.070111Z",
     "iopub.status.busy": "2023-02-05T17:10:49.069513Z",
     "iopub.status.idle": "2023-02-05T17:10:49.077679Z",
     "shell.execute_reply": "2023-02-05T17:10:49.076855Z"
    }
   },
   "outputs": [],
   "source": [
    "def success_metrics(responses, prompt, test, entry_point):\n",
    "    \"\"\"Check if the task is successful.\n",
    "\n",
    "    Args:\n",
    "        responses (list): The list of responses.\n",
    "        prompt (str): The input prompt.\n",
    "        test (str): The test code.\n",
    "        entry_point (str): The name of the function.\n",
    "\n",
    "    Returns:\n",
    "        dict: The success metrics.\n",
    "    \"\"\"\n",
    "    success_list = []\n",
    "    n = len(responses)\n",
    "    for i in range(n):\n",
    "        response = responses[i]\n",
    "        code = f\"{prompt}{response}\\n{test}\\ncheck({entry_point})\"\n",
    "        succeed = execute_code(code)\n",
    "        success_list.append(succeed)\n",
    "    return {\n",
    "        \"expected_success\": 1 - pow(1 - sum(success_list) / n, n),\n",
    "        \"success\": any(s for s in success_list),\n",
    "    }\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Use the tuning data to find a good configuration\n",
    "\n",
    "### Import the oai and tune subpackages from flaml.\n",
    "\n",
    "FLAML has provided an API for hyperparameter optimization of OpenAI models: `oai.Completion.tune` and to make a request with the tuned config: `oai.Completion.create`. First, we import oai from flaml:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:49.081129Z",
     "iopub.status.busy": "2023-02-05T17:10:49.080837Z",
     "iopub.status.idle": "2023-02-05T17:10:50.481290Z",
     "shell.execute_reply": "2023-02-05T17:10:50.480663Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "from flaml import oai, tune"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For (local) reproducibility and cost efficiency, we cache responses from OpenAI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:50.484663Z",
     "iopub.status.busy": "2023-02-05T17:10:50.484113Z",
     "iopub.status.idle": "2023-02-05T17:10:50.487729Z",
     "shell.execute_reply": "2023-02-05T17:10:50.487104Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "oai.Completion.set_cache(seed)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will create a disk cache in \".cache/{seed}\". You can change `cache_path` in `set_cache()`. The cache for different seeds are stored separately.\n",
    "\n",
    "### Perform tuning\n",
    "\n",
    "The tuning will take a while to finish, depending on the optimization budget (~15 mins for the current budget). The tuning will be performed under the specified optimization budgets.\n",
    "\n",
    "* inference_budget is the target average inference budget per instance in the benchmark. For example, 0.02 means the target inference budget is 0.02 dollars, which translates to 1000 tokens (input + output combined) if the Davinci model is used.\n",
    "* optimization_budget is the total budget allowed to perform the tuning. For example, 5 means 5 dollars are allowed in total, which translates to 250K tokens for the Davinci model.\n",
    "* num_sumples is the number of different hyperparameter configurations which is allowed to try. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided by optimization_budget.\n",
    "\n",
    "Users can specify tuning data, optimization metric, optimization mode, evaluation function, search spaces etc.. The default search space is \n",
    "```python\n",
    "    price1K = {\n",
    "        \"text-ada-001\": 0.0004,\n",
    "        \"text-babbage-001\": 0.0005,\n",
    "        \"text-curie-001\": 0.002,\n",
    "        \"code-cushman-001\": 0.002,  # TODO: update when available\n",
    "        \"code-davinci-002\": 0.02,  # TODO: update when available\n",
    "        \"text-davinci-002\": 0.02,\n",
    "        \"text-davinci-003\": 0.02,\n",
    "    }\n",
    "\n",
    "    default_search_space = {\n",
    "        \"model\": tune.choice(list(price1K.keys())),\n",
    "        \"temperature_or_top_p\": tune.choice(\n",
    "            [\n",
    "                {\"temperature\": tune.uniform(0, 1)},\n",
    "                {\"top_p\": tune.uniform(0, 1)},\n",
    "            ]\n",
    "        ),\n",
    "        \"max_tokens\": tune.lograndint(50, 1000),\n",
    "        \"n\": tune.randint(1, 100),\n",
    "        \"prompt\": \"{prompt}\",\n",
    "    }\n",
    "```\n",
    "The default search space can be overriden by users' input.\n",
    "For example, the following code specifies two choices for the model, four choices for the prompt and a fixed list of stop sequences. For hyperparameters which don't appear in users' input, the default search space will be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:10:50.490897Z",
     "iopub.status.busy": "2023-02-05T17:10:50.490347Z",
     "iopub.status.idle": "2023-02-05T17:25:42.172098Z",
     "shell.execute_reply": "2023-02-05T17:25:42.171428Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m[I 2023-02-05 17:10:50,543]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32m[I 2023-02-05 17:10:50,545]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:10:50] {806} INFO - trial 1 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:11:20] {215} INFO - result: {'expected_success': 0.6, 'success': 0.6, 'total_cost': 0.0925, 'cost': 0.0925, 'inference_cost': 0.004625, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 30.013915538787842}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:11:20] {806} INFO - trial 2 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:11:38] {215} INFO - result: {'expected_success': 0.35, 'success': 0.35, 'total_cost': 0.101218, 'cost': 0.008718, 'inference_cost': 0.0004359, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 18.025962352752686}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:11:38] {806} INFO - trial 3 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:12:19] {215} INFO - result: {'expected_success': 0.5080706992649381, 'success': 0.55, 'total_cost': 0.1527, 'cost': 0.051482, 'inference_cost': 0.0023973000000000006, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.4985070123025904}, 'config/max_tokens': 97, 'config/n': 20, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 41.3308367729187}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:12:19] {806} INFO - trial 4 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:14:02] {215} INFO - result: {'expected_success': 0.6186627404336135, 'success': 0.65, 'total_cost': 0.255956, 'cost': 0.103256, 'inference_cost': 0.0049683999999999996, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6125260668293881}, 'config/max_tokens': 433, 'config/n': 29, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 102.09728980064392}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:14:02] {806} INFO - trial 5 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:14:10] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.298296, 'cost': 0.04234, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.6177669784693172}, 'config/max_tokens': 231, 'config/n': 65, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 8.231895685195923}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:14:10] {806} INFO - trial 6 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:28] {215} INFO - result: {'expected_success': 0.5406309492528286, 'success': 0.6, 'total_cost': 0.427626, 'cost': 0.12933, 'inference_cost': 0.0061049, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.8286813263076767}, 'config/max_tokens': 57, 'config/n': 63, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 78.36990714073181}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:28] {806} INFO - trial 7 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.3255116378322488}, 'max_tokens': 81, 'n': 39, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:38] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.515286, 'cost': 0.08766, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.3255116378322488}, 'max_tokens': 81, 'n': 39, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.3255116378322488}, 'config/max_tokens': 81, 'config/n': 39, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 9.500893115997314}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:38] {806} INFO - trial 8 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.25137413420705934}, 'max_tokens': 298, 'n': 90, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:38] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.25137413420705934}, 'max_tokens': 298, 'n': 90, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.25137413420705934}, 'config/max_tokens': 298, 'config/n': 90, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0005786418914794922}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:38] {806} INFO - trial 9 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.039959208689977266}, 'max_tokens': 180, 'n': 32, 'prompt': 3, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:45] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.6413860000000001, 'cost': 0.12610000000000002, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.039959208689977266}, 'max_tokens': 180, 'n': 32, 'prompt': 3, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.039959208689977266}, 'config/max_tokens': 180, 'config/n': 32, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 7.046289443969727}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:15:45] {806} INFO - trial 10 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.5134666274082884}, 'max_tokens': 298, 'n': 26, 'prompt': 2, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:16:00] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.7098260000000002, 'cost': 0.06844, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.5134666274082884}, 'max_tokens': 298, 'n': 26, 'prompt': 2, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.5134666274082884}, 'config/max_tokens': 298, 'config/n': 26, 'config/prompt': 2, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 14.856077432632446}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:16:00] {806} INFO - trial 11 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.06425106069482445}, 'max_tokens': 938, 'n': 34, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:16:00] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.06425106069482445}, 'max_tokens': 938, 'n': 34, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.06425106069482445}, 'config/max_tokens': 938, 'config/n': 34, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0007066726684570312}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:16:00] {806} INFO - trial 12 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.46453080777933253}, 'max_tokens': 519, 'n': 72, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:16:29] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.9465260000000002, 'cost': 0.2367, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.46453080777933253}, 'max_tokens': 519, 'n': 72, 'prompt': 0, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.46453080777933253}, 'config/max_tokens': 519, 'config/n': 72, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 29.099194526672363}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:16:29] {806} INFO - trial 13 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.05047767015399762}, 'max_tokens': 137, 'n': 11, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:17:02] {215} INFO - result: {'expected_success': 0.5496534795196485, 'success': 0.55, 'total_cost': 1.2338460000000002, 'cost': 0.28731999999999996, 'inference_cost': 0.013529000000000001, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.05047767015399762}, 'max_tokens': 137, 'n': 11, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.05047767015399762}, 'config/max_tokens': 137, 'config/n': 11, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 33.49773073196411}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:17:02] {806} INFO - trial 14 config: {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:17:14] {215} INFO - result: {'expected_success': 0, 'total_cost': 1.293506, 'cost': 0.05966, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}, 'config/model': 'code-davinci-002', 'config/max_tokens': 263, 'config/n': 41, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'top_p': 0.49834557213253655}, 'experiment_tag': 'exp', 'time_total_s': 11.869095802307129}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:17:14] {806} INFO - trial 15 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.9770922524005169}, 'max_tokens': 941, 'n': 2, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:17:54] {215} INFO - result: {'expected_success': 0.3, 'success': 0.35, 'total_cost': 1.310104, 'cost': 0.016598, 'inference_cost': 0.0008100000000000001, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.9770922524005169}, 'max_tokens': 941, 'n': 2, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.9770922524005169}, 'config/max_tokens': 941, 'config/n': 2, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 40.00577735900879}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:17:54] {806} INFO - trial 16 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.012199907437239511}, 'max_tokens': 133, 'n': 1, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:18:09] {215} INFO - result: {'expected_success': 0.3, 'success': 0.3, 'total_cost': 1.31851, 'cost': 0.008406, 'inference_cost': 0.0004203, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.012199907437239511}, 'max_tokens': 133, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.012199907437239511}, 'config/max_tokens': 133, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 15.392771005630493}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:18:09] {806} INFO - trial 17 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.7782116249357127}, 'max_tokens': 488, 'n': 16, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:19:31] {215} INFO - result: {'expected_success': 0.5380678912760336, 'success': 0.55, 'total_cost': 1.3893579999999999, 'cost': 0.07084800000000002, 'inference_cost': 0.0034208000000000003, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.7782116249357127}, 'max_tokens': 488, 'n': 16, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.7782116249357127}, 'config/max_tokens': 488, 'config/n': 16, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 81.26592254638672}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:19:31] {806} INFO - trial 18 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.30477882371544746}, 'max_tokens': 109, 'n': 14, 'prompt': 2, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:20:08] {215} INFO - result: {'expected_success': 0.5087862474147904, 'success': 0.55, 'total_cost': 1.4376559999999998, 'cost': 0.04829800000000001, 'inference_cost': 0.0023106, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.30477882371544746}, 'max_tokens': 109, 'n': 14, 'prompt': 2, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.30477882371544746}, 'config/max_tokens': 109, 'config/n': 14, 'config/prompt': 2, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 37.49694466590881}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:20:08] {806} INFO - trial 19 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.23467418274340088}, 'max_tokens': 491, 'n': 15, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:21:08] {215} INFO - result: {'expected_success': 0.4990449716057748, 'success': 0.5, 'total_cost': 1.4980659999999997, 'cost': 0.06041, 'inference_cost': 0.0029204000000000005, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.23467418274340088}, 'max_tokens': 491, 'n': 15, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.23467418274340088}, 'config/max_tokens': 491, 'config/n': 15, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 59.75101923942566}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:21:08] {806} INFO - trial 20 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6715981837068888}, 'max_tokens': 162, 'n': 44, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:22:22] {215} INFO - result: {'expected_success': 0.7360027932558347, 'success': 0.8, 'total_cost': 1.6311499999999999, 'cost': 0.13308399999999998, 'inference_cost': 0.0064402, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6715981837068888}, 'max_tokens': 162, 'n': 44, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6715981837068888}, 'config/max_tokens': 162, 'config/n': 44, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 73.76863956451416}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:22:22] {806} INFO - trial 21 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6909471554430933}, 'max_tokens': 333, 'n': 48, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:24:07] {215} INFO - result: {'expected_success': 0.6707939218514837, 'success': 0.7, 'total_cost': 1.7942219999999998, 'cost': 0.16307199999999997, 'inference_cost': 0.0080082, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6909471554430933}, 'max_tokens': 333, 'n': 48, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6909471554430933}, 'config/max_tokens': 333, 'config/n': 48, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 104.97240209579468}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:24:07] {806} INFO - trial 22 config: {'model': 'code-cushman-001', 'max_tokens': 130, 'n': 41, 'prompt': 1, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.5350214472192576}}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:25:09] {215} INFO - result: {'expected_success': 0.6432731468677959, 'success': 0.7, 'total_cost': 1.9095459999999997, 'cost': 0.11532400000000001, 'inference_cost': 0.0057662, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'max_tokens': 130, 'n': 41, 'prompt': 1, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.5350214472192576}}, 'config/model': 'code-cushman-001', 'config/max_tokens': 130, 'config/n': 41, 'config/prompt': 1, 'config/stop': 0, 'config/temperature_or_top_p': {'top_p': 0.5350214472192576}, 'experiment_tag': 'exp', 'time_total_s': 62.217190980911255}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:25:09] {806} INFO - trial 23 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.829558896181596}, 'max_tokens': 211, 'n': 48, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:25:42] {215} INFO - result: {'expected_success': 0, 'total_cost': 2.00839, 'cost': 0.09884399999999999, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.829558896181596}, 'max_tokens': 211, 'n': 48, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.829558896181596}, 'config/max_tokens': 211, 'config/n': 48, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 32.64395809173584}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[flaml.tune.tune: 02-05 17:25:42] {827} WARNING - fail to sample a trial for 100 times in a row, stopping.\n"
     ]
    }
   ],
   "source": [
    "config, analysis = oai.Completion.tune(\n",
    "    data=tune_data,  # the data for tuning\n",
    "    metric=\"expected_success\",  # the metric to optimize\n",
    "    mode=\"max\",  # the optimization mode\n",
    "    eval_func=success_metrics,  # the evaluation function to return the success metrics\n",
    "    # log_file_name=\"logs/humaneval.log\",  # the log file name\n",
    "    inference_budget=0.02,  # the inference budget (dollar)\n",
    "    optimization_budget=2,  # the optimization budget (dollar)\n",
    "    # num_samples can further limit the number of trials for different hyperparameter configurations;\n",
    "    # -1 means decided by the optimization budget only\n",
    "    num_samples=-1,\n",
    "    model=tune.choice(\n",
    "        [\n",
    "            # These two models are currently free to use,\n",
    "            # so no actual cost will incur.\n",
    "            # We use a pseudo price according to their size to simulate the case\n",
    "            # where the cost is not zero.\n",
    "            \"code-cushman-001\", \n",
    "            \"code-davinci-002\",\n",
    "        ]\n",
    "    ),\n",
    "    prompt=[\n",
    "        \"{prompt}\",\n",
    "        \"# Python 3{prompt}\",\n",
    "        \"Complete the following Python function:{prompt}\",\n",
    "        \"Complete the following Python function while including necessary import statements inside the function:{prompt}\",\n",
    "    ],  # the prompt templates to choose from\n",
    "    stop=[\"\\nclass\", \"\\ndef\", \"\\nif\", \"\\nprint\"],  # the stop sequence\n",
    ")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Output tuning results\n",
    "\n",
    "After the tuning, we can print out the config and the result found by FLAML:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:25:42.175398Z",
     "iopub.status.busy": "2023-02-05T17:25:42.175079Z",
     "iopub.status.idle": "2023-02-05T17:25:42.179509Z",
     "shell.execute_reply": "2023-02-05T17:25:42.178837Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "optimized config {'model': 'code-cushman-001', 'max_tokens': 162, 'n': 44, 'prompt': '# Python 3{prompt}', 'stop': ['\\nclass', '\\ndef', '\\nif', '\\nprint'], 'top_p': 0.6715981837068888}\n",
      "best result on tuning data {'expected_success': 0.7360027932558347, 'success': 0.8, 'total_cost': 1.6311499999999999, 'cost': 0.13308399999999998, 'inference_cost': 0.0064402, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6715981837068888}, 'max_tokens': 162, 'n': 44, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6715981837068888}, 'config/max_tokens': 162, 'config/n': 44, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 73.76863956451416}\n"
     ]
    }
   ],
   "source": [
    "print(\"optimized config\", config)\n",
    "print(\"best result on tuning data\", analysis.best_result)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Make a request with the tuned config\n",
    "\n",
    "We can apply the tuned config on the request for an example task:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:25:42.182382Z",
     "iopub.status.busy": "2023-02-05T17:25:42.181954Z",
     "iopub.status.idle": "2023-02-05T17:25:43.701944Z",
     "shell.execute_reply": "2023-02-05T17:25:43.701038Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 0,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 1,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    l = len(game)\\n    l1 = len(guess)\\n    if l!=l1:\\n        return []\\n    else:\\n        list1 = []\\n        for i in range(l):\\n            if game[i]==guess[i]:\\n                list1.append(0)\\n            else:\\n                list1.append(abs(game[i]-guess[i]))\\n        return list1\\n    \\n    \\n    \\n\\n# Testing\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 2,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    score = 0\\n    res = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            res.append(0)\\n        else:\\n            res.append(abs(guess[i] - game[i]))\\n    return res\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 3,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess-game[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 4,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(game[i]-guess[i]) for i in range(len(game))]\\n\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 5,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess-game[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 6,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    score = [abs(game[i]-guess[i]) for i in range(len(game))]\\n    return score\\n\\n# solution\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 7,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(guess))]\\n\\n# Some random testing\\nimport random\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 8,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    res = []\\n    for i in range(len(guess)):\\n        if game[i] == guess[i]:\\n            res.append(0)\\n        else:\\n            res.append(abs(game[i]-guess[i]))\\n    return res\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 9,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    assert(len(game) == len(guess))\\n    diff = []\\n    for i in range(len(game)):\\n        diff.append(abs(game[i]-guess[i]))\\n    return diff\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"index\": 10,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    if len(game) != len(guess):\\n        raise ValueError(\\\"Arrays must be of same length\\\")\\n    elif len(game) < 2:\\n        raise ValueError(\\\"Arrays must contain more than one element\\\")\\n    elif len(game) == 2:\\n        if game[0] == guess[0]:\\n            return [0]\\n        else:\\n            return [abs(guess[0]-game[0])]\\n    else:\\n        if guess[0] == game[0]:\\n            return [0] + compare(game[1:],guess[1:])\\n        else:\\n            return [abs(guess[0]-game[0])] + compare(game\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 11,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    l=len(game)\\n    l2=len(guess)\\n    res=[]\\n    if l!=l2:\\n        return \\\"different lengths\\\"\\n    for i in range(l):\\n        res.append(abs(game[i]-guess[i]))\\n    return res\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"index\": 12,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    if len(game) != len(guess):\\n        return None\\n    #1st way\\n    #result = []\\n    #for i in range(len(game)):\\n    #    result.append(game[i] - guess[i])\\n    #return result\\n    #2nd way\\n    return [game[i] - guess[i] for i in range(len(game))]\\n    \\n    #3rd way\\n    #return [abs(game[i] - guess[i]) for i in range(len(game))]\\n\\n# 4. \\n# Sort the scores in ascending order. If two scores are the same, put the player with the lower rank \\n# at the start\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 13,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i] - game[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 14,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(guess))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"index\": 15,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    #TODO: \\n\\n\\n#Test.describe(\\\"Basic tests\\\")\\n#Test.assert_equals(compare([1,2,3,4,5,1],[1,2,3,4,2,-2]), [0,0,0,0,3,3])\\n#Test.assert_equals(compare([0,5,0,0,0,4],[4,1,1,0,0,-2]), [4,4,1,0,0,6])\\n#Test.assert_equals(compare([0,0,0,0,0,0],[4,1,1,0,0,-2]), [4,4,1,0,\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 16,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # your code here\\n    guess=guess[:]\\n    for i in range(len(game)):\\n        guess[i]+=abs(guess[i]-game[i])\\n    return guess\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 17,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    assert(len(game)==len(guess))\\n    ans = []\\n    for i in range(len(game)):\\n        if game[i]==guess[i]:\\n            ans.append(0)\\n        else:\\n            ans.append(abs(game[i]-guess[i]))\\n    return ans\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 18,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    diffs = []\\n    for i in range(len(game)):\\n        diffs.append(abs(guess[i]-game[i]))\\n    return diffs\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 19,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    if len(game) != len(guess):\\n        return \\\"the game and guess must be the same length\\\"\\n    else:\\n        newlist = []\\n        for i in range(len(game)):\\n            if game[i] == guess[i]:\\n                newlist.append(0)\\n            else:\\n                newlist.append(abs(guess[i]-game[i]))\\n    return newlist\\n\\ngame = [1,2,3,4,5,1]\\nguess = [1,2,3,4,2,-2]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 20,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    #game = [1,2,3,4,5,1]\\n    #guess = [1,2,3,4,2,-2]\\n    assert len(game) == len(guess), \\\"The number of guesses and the number of scores must be equal.\\\"\\n    result = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[i]-guess[i]))\\n    return result\\n\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 21,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    results = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            results.append(0)\\n        else:\\n            results.append(abs(game[i]-guess[i]))\\n    return results\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 22,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # Check that both inputs are the same length\\n    if len(game) != len(guess):\\n        raise ValueError(\\\"input arrays must be the same length\\\")\\n    # Compare scores to guesses\\n    result = []\\n    for index in range(len(game)):\\n        if game[index] == guess[index]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[index] - guess[index]))\\n    return result\\n\\n# My solution\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 23,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    res = []\\n    for i in range(len(guess)):\\n        if guess[i] == game[i]:\\n            res.append(0)\\n        else:\\n            res.append(abs(guess[i] - game[i]))\\n    return res\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 24,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 25,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(game[i] - guess[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 26,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 27,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    #my code\\n    diff = [abs(x - y) for x, y in zip(guess, game)]\\n    return diff\\n\\n\\n# ----------------------------------------------------------------------------------------------------------------------\\n\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 28,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\\n    \\n# Main\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 29,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(game))]\\n\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 30,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # Your code here\\n    return [abs(guess-score) for score,guess in zip(game,guess)]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"index\": 31,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # Your code here\\n    if not game or not guess:\\n        return []\\n    diff = [abs(a - b) for a, b in zip(game, guess)]\\n    return diff\\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"index\": 32,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # if len(game) != len(guess):\\n    #     return \\\"arrays are not of equal length\\\"\\n    # for i in range(len(game)):\\n    #     if game[i] == guess[i]:\\n    #         continue\\n    #     else:\\n    #         game[i] = abs(game[i] - guess[i])\\n    # return game\\n    return [abs(game[i] - guess[i]) for i in range(len(game))]\\n    \\n    \\n# compare([1,2,3,4,5,1],[1,2,3,4,2,-2])\\n\\n# # Test.describe(\\\"Basic tests\\\")\\n#\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 33,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(guess))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 34,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i] - game[i]) for i in range(len(game))]\\n\\n# These \\\"asserts\\\" using only for self-checking and not necessary for auto-testing\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 35,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # game = [1,2,3,4,5,1]\\n    # guess = [1,2,3,4,2,-2]\\n    answer = []\\n    for i in range(len(game)):\\n        answer.append(game[i] - guess[i])\\n    return answer\\n\\n\\n#Python 2\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 36,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    #YOUR CODE GOES HERE\\n    pass\\n\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"index\": 37,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 38,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    # your code\\n    return [0 if i == j else abs(i-j) for i,j in zip(game,guess)]\\n\\n# print(compare([1,2,3,4,5,1],[1,2,3,4,2,-2]))\\n\\n# Python 3\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 39,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(game[i]-guess[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 40,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(game[i] - guess[i]) for i in range(len(game))]\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 41,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(guess[i] - game[i]) for i in range(len(guess))]\\n\\n#print(compare([1,2,3,4,5,1],[1,2,3,4,2,-2]))\\n#print(compare([0,5,0,0,0,4],[4,1,1,0,0,-2]))\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 42,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(game[i] - guess[i]) for i in range(len(game))]\\n    \\n# Python 2\"\n",
      "    },\n",
      "    {\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"index\": 43,\n",
      "      \"logprobs\": null,\n",
      "      \"text\": \"    return [abs(score-guess) for score,guess in zip(game,guess)]\"\n",
      "    }\n",
      "  ],\n",
      "  \"created\": 1675617672,\n",
      "  \"id\": \"cmpl-6gczQC8OUNTnXpaMqwsyOTZeiLgE6\",\n",
      "  \"model\": \"code-cushman-001\",\n",
      "  \"object\": \"text_completion\",\n",
      "  \"usage\": {\n",
      "    \"completion_tokens\": 2848,\n",
      "    \"prompt_tokens\": 242,\n",
      "    \"total_tokens\": 3090\n",
      "  }\n",
      "}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'expected_success': 1.0, 'success': True}\n"
     ]
    }
   ],
   "source": [
    "responses = oai.Completion.create(context=tune_data[1], **config)\n",
    "print(responses)\n",
    "print(success_metrics([response[\"text\"].rstrip() for response in responses[\"choices\"]], **tune_data[1]))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate the success rate on the test data\n",
    "\n",
    "You can use flaml's `oai.Completion.eval` to evaluate the performance of an entire dataset with the tuned config. To do that you need to set `oai.Completion.data` to the data to evaluate. The following code will take a while to evaluate all the 144 test data instances. Compared to the baseline success rate (0.46) on the [HELM benchmark](https://crfm.stanford.edu/helm/latest/?group=code_humaneval), the tuned config has a success rate of 0.72. It can be further improved if the inference budget and optimization budget are further increased."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-05T17:25:43.705161Z",
     "iopub.status.busy": "2023-02-05T17:25:43.704845Z",
     "iopub.status.idle": "2023-02-05T17:35:06.804923Z",
     "shell.execute_reply": "2023-02-05T17:35:06.803893Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'expected_success': 0.6661693627028851, 'success': 0.7152777777777778, 'total_cost': 2.919666000000001, 'cost': 0.9112759999999998, 'inference_cost': 0.006328305555555556}\n"
     ]
    }
   ],
   "source": [
    "oai.Completion.data = test_data\n",
    "result = oai.Completion.eval(analysis.best_config, prune=False, eval_only=True)\n",
    "print(result)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15 (main, Oct 26 2022, 03:47:43) \n[GCC 10.2.1 20210110]"
  },
  "vscode": {
   "interpreter": {
    "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {
     "1e573b802f50427686c094419afde154": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_83c2cf7a5a6c4d199ec7a0af631714fa",
       "max": 1,
       "min": 0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_b273a2f4abab4419a27333d07a390fe3",
       "tabbable": null,
       "tooltip": null,
       "value": 1
      }
     },
     "2272446ea4b441649ca2bb48f41d1211": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "83c2cf7a5a6c4d199ec7a0af631714fa": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "98a6cebd3c7f4e4da00fbe350fab7ece": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "b154f9b976aa4f81999cd1d227a7ebde": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "b273a2f4abab4419a27333d07a390fe3": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "b506997e1496457b960f633e9fa4decf": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "bcceb81a497643a1a9bd2e5f34ebdf87": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_b154f9b976aa4f81999cd1d227a7ebde",
       "placeholder": "",
       "style": "IPY_MODEL_2272446ea4b441649ca2bb48f41d1211",
       "tabbable": null,
       "tooltip": null,
       "value": " 1/1 [00:00&lt;00:00, 26.28it/s]"
      }
     },
     "c0f5bae3823f4acda0d8d97581c91028": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_b506997e1496457b960f633e9fa4decf",
       "placeholder": "",
       "style": "IPY_MODEL_ea73cf00ea304b83bb5ba6616ea9a1f7",
       "tabbable": null,
       "tooltip": null,
       "value": "100%"
      }
     },
     "ea73cf00ea304b83bb5ba6616ea9a1f7": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "f0cac133ee7d4d9b91352e1002d0ece0": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_c0f5bae3823f4acda0d8d97581c91028",
        "IPY_MODEL_1e573b802f50427686c094419afde154",
        "IPY_MODEL_bcceb81a497643a1a9bd2e5f34ebdf87"
       ],
       "layout": "IPY_MODEL_98a6cebd3c7f4e4da00fbe350fab7ece",
       "tabbable": null,
       "tooltip": null
      }
     }
    },
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
-												Openai (#905)

* add cost budget; move loc of make_dir

* support openai completion

* install pytest in workflow

* skip openai test

* test openai

* path for docs rebuild

* install datasets

* signal

* notebook

* notebook in workflow

* optional arguments and special params

* key -> k

* improve readability

* assumption

* optimize for model selection

* larger range of max_tokens

* notebook

* python package workflow

* skip on win

											
										
										
											2023-02-05 20:13:08 -08:00
+								{
 								 "cells": [
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "Copyright (c) Microsoft Corporation. All rights reserved. \n",
 								    "\n",
 								    "Licensed under the MIT License.\n",
 								    "\n",
 								    "# Use FLAML to Tune OpenAI Models\n",
 								    "\n",
 								    "In this notebook, we tune OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings. \n",
 								    "\n",
 								    "## Requirements\n",
 								    "\n",
 								    "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
 								    "```bash\n",
 								    "pip install flaml[openai]\n",
 								    "```"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 1,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:46.718963Z",
 								     "iopub.status.busy": "2023-02-05T17:10:46.718348Z",
 								     "iopub.status.idle": "2023-02-05T17:10:46.722958Z",
 								     "shell.execute_reply": "2023-02-05T17:10:46.721858Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "# %pip install flaml[openai] datasets"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "Set your OpenAI key:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 2,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:46.725945Z",
 								     "iopub.status.busy": "2023-02-05T17:10:46.725628Z",
 								     "iopub.status.idle": "2023-02-05T17:10:46.732477Z",
 								     "shell.execute_reply": "2023-02-05T17:10:46.731947Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "import os\n",
 								    "\n",
 								    "if \"OPENAI_API_KEY\" not in os.environ:\n",
 								    "    os.environ[\"OPENAI_API_KEY\"] = \"<your OpenAI API key here>\""
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## Load dataset\n",
 								    "\n",
 								    "First, we load the humaneval dataset. The dataset contains 164 examples. We use the first 20 for tuning the generation hyperparameters and the remaining for evaluation. In each example, the \"prompt\" is the prompt string for eliciting the code generation, \"test\" is the Python code for unit test for the example, and \"entry_point\" is the function name to be tested."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 3,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:46.735236Z",
 								     "iopub.status.busy": "2023-02-05T17:10:46.734852Z",
 								     "iopub.status.idle": "2023-02-05T17:10:49.037146Z",
 								     "shell.execute_reply": "2023-02-05T17:10:49.036349Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "Found cached dataset openai_humaneval (/home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75)\n"
 								     ]
 								    },
 								    {
 								     "data": {
 								      "application/vnd.jupyter.widget-view+json": {
 								       "model_id": "f0cac133ee7d4d9b91352e1002d0ece0",
 								       "version_major": 2,
 								       "version_minor": 0
 								      },
 								      "text/plain": [
 								       "  0%|          | 0/1 [00:00<?, ?it/s]"
 								      ]
 								     },
 								     "metadata": {},
 								     "output_type": "display_data"
 								    },
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "Loading cached shuffled indices for dataset at /home/vscode/.cache/huggingface/datasets/openai_humaneval/openai_humaneval/1.0.0/2955cebd73602e828fa8c0a424c594e5fab4ec863b316ca98f3d8fdb6a626e75/cache-1e8448101c1b32e8.arrow\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "import datasets\n",
 								    "\n",
 								    "seed = 41\n",
 								    "data = datasets.load_dataset(\"openai_humaneval\")[\"test\"].shuffle(seed=seed)\n",
 								    "n_tune_data = 20\n",
 								    "tune_data = [\n",
 								    "    {\n",
 								    "        \"prompt\": data[x][\"prompt\"],\n",
 								    "        \"test\": data[x][\"test\"],\n",
 								    "        \"entry_point\": data[x][\"entry_point\"],\n",
 								    "    }\n",
 								    "    for x in range(n_tune_data)\n",
 								    "]\n",
 								    "test_data = [\n",
 								    "    {\n",
 								    "        \"prompt\": data[x][\"prompt\"],\n",
 								    "        \"test\": data[x][\"test\"],\n",
 								    "        \"entry_point\": data[x][\"entry_point\"],\n",
 								    "    }\n",
 								    "    for x in range(n_tune_data, len(data))\n",
 								    "]\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "Check a tuning example:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 4,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:49.042363Z",
 								     "iopub.status.busy": "2023-02-05T17:10:49.041641Z",
 								     "iopub.status.idle": "2023-02-05T17:10:49.050482Z",
 								     "shell.execute_reply": "2023-02-05T17:10:49.049608Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "subslide"
 								    },
 								    "tags": []
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "\n",
 								      "def compare(game,guess):\n",
 								      "    \"\"\"I think we all remember that feeling when the result of some long-awaited\n",
 								      "    event is finally known. The feelings and thoughts you have at that moment are\n",
 								      "    definitely worth noting down and comparing.\n",
 								      "    Your task is to determine if a person correctly guessed the results of a number of matches.\n",
 								      "    You are given two arrays of scores and guesses of equal length, where each index shows a match. \n",
 								      "    Return an array of the same length denoting how far off each guess was. If they have guessed correctly,\n",
 								      "    the value is 0, and if not, the value is the absolute difference between the guess and the score.\n",
 								      "    \n",
 								      "    \n",
 								      "    example:\n",
 								      "\n",
 								      "    compare([1,2,3,4,5,1],[1,2,3,4,2,-2]) -> [0,0,0,0,3,3]\n",
 								      "    compare([0,5,0,0,0,4],[4,1,1,0,0,-2]) -> [4,4,1,0,0,6]\n",
 								      "    \"\"\"\n",
 								      "\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "print(tune_data[1][\"prompt\"])"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "Here is one example of the unit test code for verifying the correctness of the generated code:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 5,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:49.054027Z",
 								     "iopub.status.busy": "2023-02-05T17:10:49.053625Z",
 								     "iopub.status.idle": "2023-02-05T17:10:49.058153Z",
 								     "shell.execute_reply": "2023-02-05T17:10:49.057421Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "def check(candidate):\n",
 								      "\n",
 								      "    # Check some simple cases\n",
 								      "    assert candidate([1,2,3,4,5,1],[1,2,3,4,2,-2])==[0,0,0,0,3,3], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "    assert candidate([0,0,0,0,0,0],[0,0,0,0,0,0])==[0,0,0,0,0,0], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "    assert candidate([1,2,3],[-1,-2,-3])==[2,4,6], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "    assert candidate([1,2,3,5],[-1,2,3,4])==[2,0,0,1], \"This prints if this assert fails 1 (good for debugging!)\"\n",
 								      "\n",
 								      "    # Check some edge cases that are easy to work out by hand.\n",
 								      "    assert True, \"This prints if this assert fails 2 (also good for debugging!)\"\n",
 								      "\n",
 								      "\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "print(tune_data[1][\"test\"])"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## Define Success Metric\n",
 								    "\n",
 								    "Before we start tuning, we need to define the success metric we want to opotimize. For each code generation task, if one of the returned responses can pass the test, we consider the task as successfully solved. Then we can define the mean success rate of a collection of tasks.\n",
 								    "\n",
 								    "### Define a code executor\n",
 								    "\n",
 								    "First, we write a simple code executor. The code executor takes the generated code and the test code as the input, and execute them with a timer."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 6,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:49.061566Z",
 								     "iopub.status.busy": "2023-02-05T17:10:49.061259Z",
 								     "iopub.status.idle": "2023-02-05T17:10:49.066812Z",
 								     "shell.execute_reply": "2023-02-05T17:10:49.066136Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "import signal\n",
 								    "import subprocess\n",
 								    "import sys\n",
 								    "\n",
 								    "def timeout_handler(signum, frame):\n",
 								    "    raise TimeoutError(\"Timed out!\")\n",
 								    "\n",
 								    "signal.signal(signal.SIGALRM, timeout_handler)\n",
 								    "max_exec_time = 3  # seconds\n",
 								    "\n",
 								    "def execute_code(code):\n",
 								    "    code = code.strip()\n",
 								    "    with open(\"codetest.py\", \"w\") as fout:\n",
 								    "        fout.write(code)\n",
 								    "    try:\n",
 								    "        signal.alarm(max_exec_time)\n",
 								    "        result = subprocess.run(\n",
 								    "            [sys.executable, \"codetest.py\"],\n",
 								    "            stdout=subprocess.DEVNULL,\n",
 								    "            stderr=subprocess.PIPE,\n",
 								    "        )\n",
 								    "        signal.alarm(0)\n",
 								    "    except TimeoutError:\n",
 								    "        return 0\n",
 								    "    return int(result.returncode == 0)"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "This function will create a temp file \"codetest.py\" and execute it in a separate process. It allows for 3 seconds to finish that code.\n",
 								    "\n",
 								    "### Define a function to evaluate the success for a given program synthesis task"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 7,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:49.070111Z",
 								     "iopub.status.busy": "2023-02-05T17:10:49.069513Z",
 								     "iopub.status.idle": "2023-02-05T17:10:49.077679Z",
 								     "shell.execute_reply": "2023-02-05T17:10:49.076855Z"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "def success_metrics(responses, prompt, test, entry_point):\n",
 								    "    \"\"\"Check if the task is successful.\n",
 								    "\n",
 								    "    Args:\n",
 								    "        responses (list): The list of responses.\n",
 								    "        prompt (str): The input prompt.\n",
 								    "        test (str): The test code.\n",
 								    "        entry_point (str): The name of the function.\n",
 								    "\n",
 								    "    Returns:\n",
 								    "        dict: The success metrics.\n",
 								    "    \"\"\"\n",
 								    "    success_list = []\n",
 								    "    n = len(responses)\n",
 								    "    for i in range(n):\n",
 								    "        response = responses[i]\n",
 								    "        code = f\"{prompt}{response}\\n{test}\\ncheck({entry_point})\"\n",
 								    "        succeed = execute_code(code)\n",
 								    "        success_list.append(succeed)\n",
 								    "    return {\n",
 								    "        \"expected_success\": 1 - pow(1 - sum(success_list) / n, n),\n",
 								    "        \"success\": any(s for s in success_list),\n",
 								    "    }\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "## Use the tuning data to find a good configuration\n",
 								    "\n",
 								    "### Import the oai and tune subpackages from flaml.\n",
 								    "\n",
 								    "FLAML has provided an API for hyperparameter optimization of OpenAI models: `oai.Completion.tune` and to make a request with the tuned config: `oai.Completion.create`. First, we import oai from flaml:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 8,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:49.081129Z",
 								     "iopub.status.busy": "2023-02-05T17:10:49.080837Z",
 								     "iopub.status.idle": "2023-02-05T17:10:50.481290Z",
 								     "shell.execute_reply": "2023-02-05T17:10:50.480663Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "from flaml import oai, tune"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "For (local) reproducibility and cost efficiency, we cache responses from OpenAI."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 9,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:50.484663Z",
 								     "iopub.status.busy": "2023-02-05T17:10:50.484113Z",
 								     "iopub.status.idle": "2023-02-05T17:10:50.487729Z",
 								     "shell.execute_reply": "2023-02-05T17:10:50.487104Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "outputs": [],
 								   "source": [
 								    "oai.Completion.set_cache(seed)"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "This will create a disk cache in \".cache/{seed}\". You can change `cache_path` in `set_cache()`. The cache for different seeds are stored separately.\n",
 								    "\n",
 								    "### Perform tuning\n",
 								    "\n",
 								    "The tuning will take a while to finish, depending on the optimization budget (~15 mins for the current budget). The tuning will be performed under the specified optimization budgets.\n",
 								    "\n",
 								    "* inference_budget is the target average inference budget per instance in the benchmark. For example, 0.02 means the target inference budget is 0.02 dollars, which translates to 1000 tokens (input + output combined) if the Davinci model is used.\n",
 								    "* optimization_budget is the total budget allowed to perform the tuning. For example, 5 means 5 dollars are allowed in total, which translates to 250K tokens for the Davinci model.\n",
 								    "* num_sumples is the number of different hyperparameter configurations which is allowed to try. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided by optimization_budget.\n",
 								    "\n",
 								    "Users can specify tuning data, optimization metric, optimization mode, evaluation function, search spaces etc.. The default search space is \n",
 								    "```python\n",
 								    "    price1K = {\n",
 								    "        \"text-ada-001\": 0.0004,\n",
 								    "        \"text-babbage-001\": 0.0005,\n",
 								    "        \"text-curie-001\": 0.002,\n",
 								    "        \"code-cushman-001\": 0.002,  # TODO: update when available\n",
 								    "        \"code-davinci-002\": 0.02,  # TODO: update when available\n",
 								    "        \"text-davinci-002\": 0.02,\n",
 								    "        \"text-davinci-003\": 0.02,\n",
 								    "    }\n",
 								    "\n",
 								    "    default_search_space = {\n",
 								    "        \"model\": tune.choice(list(price1K.keys())),\n",
 								    "        \"temperature_or_top_p\": tune.choice(\n",
 								    "            [\n",
 								    "                {\"temperature\": tune.uniform(0, 1)},\n",
 								    "                {\"top_p\": tune.uniform(0, 1)},\n",
 								    "            ]\n",
 								    "        ),\n",
 								    "        \"max_tokens\": tune.lograndint(50, 1000),\n",
 								    "        \"n\": tune.randint(1, 100),\n",
 								    "        \"prompt\": \"{prompt}\",\n",
 								    "    }\n",
 								    "```\n",
 								    "The default search space can be overriden by users' input.\n",
 								    "For example, the following code specifies two choices for the model, four choices for the prompt and a fixed list of stop sequences. For hyperparameters which don't appear in users' input, the default search space will be used."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 10,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:10:50.490897Z",
 								     "iopub.status.busy": "2023-02-05T17:10:50.490347Z",
 								     "iopub.status.idle": "2023-02-05T17:25:42.172098Z",
 								     "shell.execute_reply": "2023-02-05T17:25:42.171428Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "\u001b[32m[I 2023-02-05 17:10:50,543]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
 								     ]
 								    },
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "\u001b[32m[I 2023-02-05 17:10:50,545]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:10:50] {806} INFO - trial 1 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:11:20] {215} INFO - result: {'expected_success': 0.6, 'success': 0.6, 'total_cost': 0.0925, 'cost': 0.0925, 'inference_cost': 0.004625, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 30.013915538787842}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:11:20] {806} INFO - trial 2 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:11:38] {215} INFO - result: {'expected_success': 0.35, 'success': 0.35, 'total_cost': 0.101218, 'cost': 0.008718, 'inference_cost': 0.0004359, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 18.025962352752686}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:11:38] {806} INFO - trial 3 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:12:19] {215} INFO - result: {'expected_success': 0.5080706992649381, 'success': 0.55, 'total_cost': 0.1527, 'cost': 0.051482, 'inference_cost': 0.0023973000000000006, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.4985070123025904}, 'config/max_tokens': 97, 'config/n': 20, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 41.3308367729187}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:12:19] {806} INFO - trial 4 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:14:02] {215} INFO - result: {'expected_success': 0.6186627404336135, 'success': 0.65, 'total_cost': 0.255956, 'cost': 0.103256, 'inference_cost': 0.0049683999999999996, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6125260668293881}, 'config/max_tokens': 433, 'config/n': 29, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 102.09728980064392}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:14:02] {806} INFO - trial 5 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:14:10] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.298296, 'cost': 0.04234, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.6177669784693172}, 'config/max_tokens': 231, 'config/n': 65, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 8.231895685195923}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:14:10] {806} INFO - trial 6 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:28] {215} INFO - result: {'expected_success': 0.5406309492528286, 'success': 0.6, 'total_cost': 0.427626, 'cost': 0.12933, 'inference_cost': 0.0061049, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.8286813263076767}, 'config/max_tokens': 57, 'config/n': 63, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 78.36990714073181}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:28] {806} INFO - trial 7 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.3255116378322488}, 'max_tokens': 81, 'n': 39, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:38] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.515286, 'cost': 0.08766, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.3255116378322488}, 'max_tokens': 81, 'n': 39, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.3255116378322488}, 'config/max_tokens': 81, 'config/n': 39, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 9.500893115997314}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:38] {806} INFO - trial 8 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.25137413420705934}, 'max_tokens': 298, 'n': 90, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:38] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.25137413420705934}, 'max_tokens': 298, 'n': 90, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.25137413420705934}, 'config/max_tokens': 298, 'config/n': 90, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0005786418914794922}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:38] {806} INFO - trial 9 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.039959208689977266}, 'max_tokens': 180, 'n': 32, 'prompt': 3, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:45] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.6413860000000001, 'cost': 0.12610000000000002, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.039959208689977266}, 'max_tokens': 180, 'n': 32, 'prompt': 3, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.039959208689977266}, 'config/max_tokens': 180, 'config/n': 32, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 7.046289443969727}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:15:45] {806} INFO - trial 10 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.5134666274082884}, 'max_tokens': 298, 'n': 26, 'prompt': 2, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:16:00] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.7098260000000002, 'cost': 0.06844, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'top_p': 0.5134666274082884}, 'max_tokens': 298, 'n': 26, 'prompt': 2, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'top_p': 0.5134666274082884}, 'config/max_tokens': 298, 'config/n': 26, 'config/prompt': 2, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 14.856077432632446}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:16:00] {806} INFO - trial 11 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.06425106069482445}, 'max_tokens': 938, 'n': 34, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:16:00] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.06425106069482445}, 'max_tokens': 938, 'n': 34, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.06425106069482445}, 'config/max_tokens': 938, 'config/n': 34, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0007066726684570312}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:16:00] {806} INFO - trial 12 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.46453080777933253}, 'max_tokens': 519, 'n': 72, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:16:29] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.9465260000000002, 'cost': 0.2367, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.46453080777933253}, 'max_tokens': 519, 'n': 72, 'prompt': 0, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.46453080777933253}, 'config/max_tokens': 519, 'config/n': 72, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 29.099194526672363}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:16:29] {806} INFO - trial 13 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.05047767015399762}, 'max_tokens': 137, 'n': 11, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:17:02] {215} INFO - result: {'expected_success': 0.5496534795196485, 'success': 0.55, 'total_cost': 1.2338460000000002, 'cost': 0.28731999999999996, 'inference_cost': 0.013529000000000001, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.05047767015399762}, 'max_tokens': 137, 'n': 11, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.05047767015399762}, 'config/max_tokens': 137, 'config/n': 11, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 33.49773073196411}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:17:02] {806} INFO - trial 14 config: {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:17:14] {215} INFO - result: {'expected_success': 0, 'total_cost': 1.293506, 'cost': 0.05966, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}, 'config/model': 'code-davinci-002', 'config/max_tokens': 263, 'config/n': 41, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'top_p': 0.49834557213253655}, 'experiment_tag': 'exp', 'time_total_s': 11.869095802307129}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:17:14] {806} INFO - trial 15 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.9770922524005169}, 'max_tokens': 941, 'n': 2, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:17:54] {215} INFO - result: {'expected_success': 0.3, 'success': 0.35, 'total_cost': 1.310104, 'cost': 0.016598, 'inference_cost': 0.0008100000000000001, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.9770922524005169}, 'max_tokens': 941, 'n': 2, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.9770922524005169}, 'config/max_tokens': 941, 'config/n': 2, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 40.00577735900879}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:17:54] {806} INFO - trial 16 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.012199907437239511}, 'max_tokens': 133, 'n': 1, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:18:09] {215} INFO - result: {'expected_success': 0.3, 'success': 0.3, 'total_cost': 1.31851, 'cost': 0.008406, 'inference_cost': 0.0004203, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.012199907437239511}, 'max_tokens': 133, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.012199907437239511}, 'config/max_tokens': 133, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 15.392771005630493}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:18:09] {806} INFO - trial 17 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.7782116249357127}, 'max_tokens': 488, 'n': 16, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:19:31] {215} INFO - result: {'expected_success': 0.5380678912760336, 'success': 0.55, 'total_cost': 1.3893579999999999, 'cost': 0.07084800000000002, 'inference_cost': 0.0034208000000000003, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.7782116249357127}, 'max_tokens': 488, 'n': 16, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.7782116249357127}, 'config/max_tokens': 488, 'config/n': 16, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 81.26592254638672}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:19:31] {806} INFO - trial 18 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.30477882371544746}, 'max_tokens': 109, 'n': 14, 'prompt': 2, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:20:08] {215} INFO - result: {'expected_success': 0.5087862474147904, 'success': 0.55, 'total_cost': 1.4376559999999998, 'cost': 0.04829800000000001, 'inference_cost': 0.0023106, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.30477882371544746}, 'max_tokens': 109, 'n': 14, 'prompt': 2, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.30477882371544746}, 'config/max_tokens': 109, 'config/n': 14, 'config/prompt': 2, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 37.49694466590881}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:20:08] {806} INFO - trial 19 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.23467418274340088}, 'max_tokens': 491, 'n': 15, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:21:08] {215} INFO - result: {'expected_success': 0.4990449716057748, 'success': 0.5, 'total_cost': 1.4980659999999997, 'cost': 0.06041, 'inference_cost': 0.0029204000000000005, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.23467418274340088}, 'max_tokens': 491, 'n': 15, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.23467418274340088}, 'config/max_tokens': 491, 'config/n': 15, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 59.75101923942566}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:21:08] {806} INFO - trial 20 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6715981837068888}, 'max_tokens': 162, 'n': 44, 'prompt': 1, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:22:22] {215} INFO - result: {'expected_success': 0.7360027932558347, 'success': 0.8, 'total_cost': 1.6311499999999999, 'cost': 0.13308399999999998, 'inference_cost': 0.0064402, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6715981837068888}, 'max_tokens': 162, 'n': 44, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6715981837068888}, 'config/max_tokens': 162, 'config/n': 44, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 73.76863956451416}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:22:22] {806} INFO - trial 21 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6909471554430933}, 'max_tokens': 333, 'n': 48, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:24:07] {215} INFO - result: {'expected_success': 0.6707939218514837, 'success': 0.7, 'total_cost': 1.7942219999999998, 'cost': 0.16307199999999997, 'inference_cost': 0.0080082, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6909471554430933}, 'max_tokens': 333, 'n': 48, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6909471554430933}, 'config/max_tokens': 333, 'config/n': 48, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 104.97240209579468}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:24:07] {806} INFO - trial 22 config: {'model': 'code-cushman-001', 'max_tokens': 130, 'n': 41, 'prompt': 1, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.5350214472192576}}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:25:09] {215} INFO - result: {'expected_success': 0.6432731468677959, 'success': 0.7, 'total_cost': 1.9095459999999997, 'cost': 0.11532400000000001, 'inference_cost': 0.0057662, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'max_tokens': 130, 'n': 41, 'prompt': 1, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.5350214472192576}}, 'config/model': 'code-cushman-001', 'config/max_tokens': 130, 'config/n': 41, 'config/prompt': 1, 'config/stop': 0, 'config/temperature_or_top_p': {'top_p': 0.5350214472192576}, 'experiment_tag': 'exp', 'time_total_s': 62.217190980911255}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:25:09] {806} INFO - trial 23 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.829558896181596}, 'max_tokens': 211, 'n': 48, 'prompt': 0, 'stop': 0}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:25:42] {215} INFO - result: {'expected_success': 0, 'total_cost': 2.00839, 'cost': 0.09884399999999999, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.829558896181596}, 'max_tokens': 211, 'n': 48, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.829558896181596}, 'config/max_tokens': 211, 'config/n': 48, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 32.64395809173584}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "[flaml.tune.tune: 02-05 17:25:42] {827} WARNING - fail to sample a trial for 100 times in a row, stopping.\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "config, analysis = oai.Completion.tune(\n",
 								    "    data=tune_data,  # the data for tuning\n",
 								    "    metric=\"expected_success\",  # the metric to optimize\n",
 								    "    mode=\"max\",  # the optimization mode\n",
 								    "    eval_func=success_metrics,  # the evaluation function to return the success metrics\n",
 								    "    # log_file_name=\"logs/humaneval.log\",  # the log file name\n",
 								    "    inference_budget=0.02,  # the inference budget (dollar)\n",
 								    "    optimization_budget=2,  # the optimization budget (dollar)\n",
 								    "    # num_samples can further limit the number of trials for different hyperparameter configurations;\n",
 								    "    # -1 means decided by the optimization budget only\n",
 								    "    num_samples=-1,\n",
 								    "    model=tune.choice(\n",
 								    "        [\n",
 								    "            # These two models are currently free to use,\n",
 								    "            # so no actual cost will incur.\n",
 								    "            # We use a pseudo price according to their size to simulate the case\n",
 								    "            # where the cost is not zero.\n",
 								    "            \"code-cushman-001\", \n",
 								    "            \"code-davinci-002\",\n",
 								    "        ]\n",
 								    "    ),\n",
 								    "    prompt=[\n",
 								    "        \"{prompt}\",\n",
 								    "        \"# Python 3{prompt}\",\n",
 								    "        \"Complete the following Python function:{prompt}\",\n",
 								    "        \"Complete the following Python function while including necessary import statements inside the function:{prompt}\",\n",
 								    "    ],  # the prompt templates to choose from\n",
 								    "    stop=[\"\\nclass\", \"\\ndef\", \"\\nif\", \"\\nprint\"],  # the stop sequence\n",
 								    ")\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "### Output tuning results\n",
 								    "\n",
 								    "After the tuning, we can print out the config and the result found by FLAML:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 11,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:25:42.175398Z",
 								     "iopub.status.busy": "2023-02-05T17:25:42.175079Z",
 								     "iopub.status.idle": "2023-02-05T17:25:42.179509Z",
 								     "shell.execute_reply": "2023-02-05T17:25:42.178837Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "optimized config {'model': 'code-cushman-001', 'max_tokens': 162, 'n': 44, 'prompt': '# Python 3{prompt}', 'stop': ['\\nclass', '\\ndef', '\\nif', '\\nprint'], 'top_p': 0.6715981837068888}\n",
 								      "best result on tuning data {'expected_success': 0.7360027932558347, 'success': 0.8, 'total_cost': 1.6311499999999999, 'cost': 0.13308399999999998, 'inference_cost': 0.0064402, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6715981837068888}, 'max_tokens': 162, 'n': 44, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6715981837068888}, 'config/max_tokens': 162, 'config/n': 44, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 73.76863956451416}\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "print(\"optimized config\", config)\n",
 								    "print(\"best result on tuning data\", analysis.best_result)"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {
 								    "slideshow": {
 								     "slide_type": "slide"
 								    }
 								   },
 								   "source": [
 								    "### Make a request with the tuned config\n",
 								    "\n",
 								    "We can apply the tuned config on the request for an example task:"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 12,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:25:42.182382Z",
 								     "iopub.status.busy": "2023-02-05T17:25:42.181954Z",
 								     "iopub.status.idle": "2023-02-05T17:25:43.701944Z",
 								     "shell.execute_reply": "2023-02-05T17:25:43.701038Z"
 								    },
 								    "slideshow": {
 								     "slide_type": "subslide"
 								    },
 								    "tags": []
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "{\n",
 								      "  \"choices\": [\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 0,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 1,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    l = len(game)\\n    l1 = len(guess)\\n    if l!=l1:\\n        return []\\n    else:\\n        list1 = []\\n        for i in range(l):\\n            if game[i]==guess[i]:\\n                list1.append(0)\\n            else:\\n                list1.append(abs(game[i]-guess[i]))\\n        return list1\\n    \\n    \\n    \\n\\n# Testing\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 2,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    score = 0\\n    res = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            res.append(0)\\n        else:\\n            res.append(abs(guess[i] - game[i]))\\n    return res\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 3,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess-game[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 4,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(game[i]-guess[i]) for i in range(len(game))]\\n\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 5,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess-game[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 6,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    score = [abs(game[i]-guess[i]) for i in range(len(game))]\\n    return score\\n\\n# solution\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 7,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(guess))]\\n\\n# Some random testing\\nimport random\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 8,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    res = []\\n    for i in range(len(guess)):\\n        if game[i] == guess[i]:\\n            res.append(0)\\n        else:\\n            res.append(abs(game[i]-guess[i]))\\n    return res\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 9,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    assert(len(game) == len(guess))\\n    diff = []\\n    for i in range(len(game)):\\n        diff.append(abs(game[i]-guess[i]))\\n    return diff\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"length\",\n",
 								      "      \"index\": 10,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    if len(game) != len(guess):\\n        raise ValueError(\\\"Arrays must be of same length\\\")\\n    elif len(game) < 2:\\n        raise ValueError(\\\"Arrays must contain more than one element\\\")\\n    elif len(game) == 2:\\n        if game[0] == guess[0]:\\n            return [0]\\n        else:\\n            return [abs(guess[0]-game[0])]\\n    else:\\n        if guess[0] == game[0]:\\n            return [0] + compare(game[1:],guess[1:])\\n        else:\\n            return [abs(guess[0]-game[0])] + compare(game\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 11,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    l=len(game)\\n    l2=len(guess)\\n    res=[]\\n    if l!=l2:\\n        return \\\"different lengths\\\"\\n    for i in range(l):\\n        res.append(abs(game[i]-guess[i]))\\n    return res\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"length\",\n",
 								      "      \"index\": 12,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    if len(game) != len(guess):\\n        return None\\n    #1st way\\n    #result = []\\n    #for i in range(len(game)):\\n    #    result.append(game[i] - guess[i])\\n    #return result\\n    #2nd way\\n    return [game[i] - guess[i] for i in range(len(game))]\\n    \\n    #3rd way\\n    #return [abs(game[i] - guess[i]) for i in range(len(game))]\\n\\n# 4. \\n# Sort the scores in ascending order. If two scores are the same, put the player with the lower rank \\n# at the start\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 13,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i] - game[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 14,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(guess))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"length\",\n",
 								      "      \"index\": 15,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    #TODO: \\n\\n\\n#Test.describe(\\\"Basic tests\\\")\\n#Test.assert_equals(compare([1,2,3,4,5,1],[1,2,3,4,2,-2]), [0,0,0,0,3,3])\\n#Test.assert_equals(compare([0,5,0,0,0,4],[4,1,1,0,0,-2]), [4,4,1,0,0,6])\\n#Test.assert_equals(compare([0,0,0,0,0,0],[4,1,1,0,0,-2]), [4,4,1,0,\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 16,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # your code here\\n    guess=guess[:]\\n    for i in range(len(game)):\\n        guess[i]+=abs(guess[i]-game[i])\\n    return guess\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 17,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    assert(len(game)==len(guess))\\n    ans = []\\n    for i in range(len(game)):\\n        if game[i]==guess[i]:\\n            ans.append(0)\\n        else:\\n            ans.append(abs(game[i]-guess[i]))\\n    return ans\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 18,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    diffs = []\\n    for i in range(len(game)):\\n        diffs.append(abs(guess[i]-game[i]))\\n    return diffs\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 19,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    if len(game) != len(guess):\\n        return \\\"the game and guess must be the same length\\\"\\n    else:\\n        newlist = []\\n        for i in range(len(game)):\\n            if game[i] == guess[i]:\\n                newlist.append(0)\\n            else:\\n                newlist.append(abs(guess[i]-game[i]))\\n    return newlist\\n\\ngame = [1,2,3,4,5,1]\\nguess = [1,2,3,4,2,-2]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 20,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    #game = [1,2,3,4,5,1]\\n    #guess = [1,2,3,4,2,-2]\\n    assert len(game) == len(guess), \\\"The number of guesses and the number of scores must be equal.\\\"\\n    result = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[i]-guess[i]))\\n    return result\\n\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 21,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    results = []\\n    for i in range(len(game)):\\n        if game[i] == guess[i]:\\n            results.append(0)\\n        else:\\n            results.append(abs(game[i]-guess[i]))\\n    return results\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 22,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # Check that both inputs are the same length\\n    if len(game) != len(guess):\\n        raise ValueError(\\\"input arrays must be the same length\\\")\\n    # Compare scores to guesses\\n    result = []\\n    for index in range(len(game)):\\n        if game[index] == guess[index]:\\n            result.append(0)\\n        else:\\n            result.append(abs(game[index] - guess[index]))\\n    return result\\n\\n# My solution\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 23,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    res = []\\n    for i in range(len(guess)):\\n        if guess[i] == game[i]:\\n            res.append(0)\\n        else:\\n            res.append(abs(guess[i] - game[i]))\\n    return res\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 24,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 25,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(game[i] - guess[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 26,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 27,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    #my code\\n    diff = [abs(x - y) for x, y in zip(guess, game)]\\n    return diff\\n\\n\\n# ----------------------------------------------------------------------------------------------------------------------\\n\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 28,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\\n    \\n# Main\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 29,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(game))]\\n\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 30,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # Your code here\\n    return [abs(guess-score) for score,guess in zip(game,guess)]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"length\",\n",
 								      "      \"index\": 31,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # Your code here\\n    if not game or not guess:\\n        return []\\n    diff = [abs(a - b) for a, b in zip(game, guess)]\\n    return diff\\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"length\",\n",
 								      "      \"index\": 32,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # if len(game) != len(guess):\\n    #     return \\\"arrays are not of equal length\\\"\\n    # for i in range(len(game)):\\n    #     if game[i] == guess[i]:\\n    #         continue\\n    #     else:\\n    #         game[i] = abs(game[i] - guess[i])\\n    # return game\\n    return [abs(game[i] - guess[i]) for i in range(len(game))]\\n    \\n    \\n# compare([1,2,3,4,5,1],[1,2,3,4,2,-2])\\n\\n# # Test.describe(\\\"Basic tests\\\")\\n#\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 33,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i]-game[i]) for i in range(len(guess))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 34,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i] - game[i]) for i in range(len(game))]\\n\\n# These \\\"asserts\\\" using only for self-checking and not necessary for auto-testing\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 35,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # game = [1,2,3,4,5,1]\\n    # guess = [1,2,3,4,2,-2]\\n    answer = []\\n    for i in range(len(game)):\\n        answer.append(game[i] - guess[i])\\n    return answer\\n\\n\\n#Python 2\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 36,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    #YOUR CODE GOES HERE\\n    pass\\n\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"length\",\n",
 								      "      \"index\": 37,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(x-y) for x,y in zip(game,guess)]\\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \\n    \"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 38,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    # your code\\n    return [0 if i == j else abs(i-j) for i,j in zip(game,guess)]\\n\\n# print(compare([1,2,3,4,5,1],[1,2,3,4,2,-2]))\\n\\n# Python 3\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 39,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(game[i]-guess[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 40,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(game[i] - guess[i]) for i in range(len(game))]\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 41,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(guess[i] - game[i]) for i in range(len(guess))]\\n\\n#print(compare([1,2,3,4,5,1],[1,2,3,4,2,-2]))\\n#print(compare([0,5,0,0,0,4],[4,1,1,0,0,-2]))\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 42,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(game[i] - guess[i]) for i in range(len(game))]\\n    \\n# Python 2\"\n",
 								      "    },\n",
 								      "    {\n",
 								      "      \"finish_reason\": \"stop\",\n",
 								      "      \"index\": 43,\n",
 								      "      \"logprobs\": null,\n",
 								      "      \"text\": \"    return [abs(score-guess) for score,guess in zip(game,guess)]\"\n",
 								      "    }\n",
 								      "  ],\n",
 								      "  \"created\": 1675617672,\n",
 								      "  \"id\": \"cmpl-6gczQC8OUNTnXpaMqwsyOTZeiLgE6\",\n",
 								      "  \"model\": \"code-cushman-001\",\n",
 								      "  \"object\": \"text_completion\",\n",
 								      "  \"usage\": {\n",
 								      "    \"completion_tokens\": 2848,\n",
 								      "    \"prompt_tokens\": 242,\n",
 								      "    \"total_tokens\": 3090\n",
 								      "  }\n",
 								      "}\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "{'expected_success': 1.0, 'success': True}\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "responses = oai.Completion.create(context=tune_data[1], **config)\n",
 								    "print(responses)\n",
 								    "print(success_metrics([response[\"text\"].rstrip() for response in responses[\"choices\"]], **tune_data[1]))\n"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "### Evaluate the success rate on the test data\n",
 								    "\n",
 								    "You can use flaml's `oai.Completion.eval` to evaluate the performance of an entire dataset with the tuned config. To do that you need to set `oai.Completion.data` to the data to evaluate. The following code will take a while to evaluate all the 144 test data instances. Compared to the baseline success rate (0.46) on the [HELM benchmark](https://crfm.stanford.edu/helm/latest/?group=code_humaneval), the tuned config has a success rate of 0.72. It can be further improved if the inference budget and optimization budget are further increased."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": 13,
 								   "metadata": {
 								    "execution": {
 								     "iopub.execute_input": "2023-02-05T17:25:43.705161Z",
 								     "iopub.status.busy": "2023-02-05T17:25:43.704845Z",
 								     "iopub.status.idle": "2023-02-05T17:35:06.804923Z",
 								     "shell.execute_reply": "2023-02-05T17:35:06.803893Z"
 								    }
 								   },
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "{'expected_success': 0.6661693627028851, 'success': 0.7152777777777778, 'total_cost': 2.919666000000001, 'cost': 0.9112759999999998, 'inference_cost': 0.006328305555555556}\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "oai.Completion.data = test_data\n",
 								    "result = oai.Completion.eval(analysis.best_config, prune=False, eval_only=True)\n",
 								    "print(result)\n"
 								   ]
 								  }
 								 ],
 								 "metadata": {
 								  "kernelspec": {
 								   "display_name": "Python 3",
 								   "language": "python",
 								   "name": "python3"
 								  },
 								  "language_info": {
 								   "codemirror_mode": {
 								    "name": "ipython",
 								    "version": 3
 								   },
 								   "file_extension": ".py",
 								   "mimetype": "text/x-python",
 								   "name": "python",
 								   "nbconvert_exporter": "python",
 								   "pygments_lexer": "ipython3",
 								   "version": "3.9.15 (main, Oct 26 2022, 03:47:43) \n[GCC 10.2.1 20210110]"
 								  },
 								  "vscode": {
 								   "interpreter": {
 								    "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
 								   }
 								  },
 								  "widgets": {
 								   "application/vnd.jupyter.widget-state+json": {
 								    "state": {
 								     "1e573b802f50427686c094419afde154": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "FloatProgressModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "FloatProgressModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "ProgressView",
 								       "bar_style": "success",
 								       "description": "",
 								       "description_allow_html": false,
 								       "layout": "IPY_MODEL_83c2cf7a5a6c4d199ec7a0af631714fa",
 								       "max": 1,
 								       "min": 0,
 								       "orientation": "horizontal",
 								       "style": "IPY_MODEL_b273a2f4abab4419a27333d07a390fe3",
 								       "tabbable": null,
 								       "tooltip": null,
 								       "value": 1
 								      }
 								     },
 								     "2272446ea4b441649ca2bb48f41d1211": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLStyleModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLStyleModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "StyleView",
 								       "background": null,
 								       "description_width": "",
 								       "font_size": null,
 								       "text_color": null
 								      }
 								     },
 								     "83c2cf7a5a6c4d199ec7a0af631714fa": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "98a6cebd3c7f4e4da00fbe350fab7ece": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "b154f9b976aa4f81999cd1d227a7ebde": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "b273a2f4abab4419a27333d07a390fe3": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "ProgressStyleModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "ProgressStyleModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "StyleView",
 								       "bar_color": null,
 								       "description_width": ""
 								      }
 								     },
 								     "b506997e1496457b960f633e9fa4decf": {
 								      "model_module": "@jupyter-widgets/base",
 								      "model_module_version": "2.0.0",
 								      "model_name": "LayoutModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/base",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "LayoutModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "LayoutView",
 								       "align_content": null,
 								       "align_items": null,
 								       "align_self": null,
 								       "border_bottom": null,
 								       "border_left": null,
 								       "border_right": null,
 								       "border_top": null,
 								       "bottom": null,
 								       "display": null,
 								       "flex": null,
 								       "flex_flow": null,
 								       "grid_area": null,
 								       "grid_auto_columns": null,
 								       "grid_auto_flow": null,
 								       "grid_auto_rows": null,
 								       "grid_column": null,
 								       "grid_gap": null,
 								       "grid_row": null,
 								       "grid_template_areas": null,
 								       "grid_template_columns": null,
 								       "grid_template_rows": null,
 								       "height": null,
 								       "justify_content": null,
 								       "justify_items": null,
 								       "left": null,
 								       "margin": null,
 								       "max_height": null,
 								       "max_width": null,
 								       "min_height": null,
 								       "min_width": null,
 								       "object_fit": null,
 								       "object_position": null,
 								       "order": null,
 								       "overflow": null,
 								       "padding": null,
 								       "right": null,
 								       "top": null,
 								       "visibility": null,
 								       "width": null
 								      }
 								     },
 								     "bcceb81a497643a1a9bd2e5f34ebdf87": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "HTMLView",
 								       "description": "",
 								       "description_allow_html": false,
 								       "layout": "IPY_MODEL_b154f9b976aa4f81999cd1d227a7ebde",
 								       "placeholder": "",
 								       "style": "IPY_MODEL_2272446ea4b441649ca2bb48f41d1211",
 								       "tabbable": null,
 								       "tooltip": null,
 								       "value": " 1/1 [00:00&lt;00:00, 26.28it/s]"
 								      }
 								     },
 								     "c0f5bae3823f4acda0d8d97581c91028": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "HTMLView",
 								       "description": "",
 								       "description_allow_html": false,
 								       "layout": "IPY_MODEL_b506997e1496457b960f633e9fa4decf",
 								       "placeholder": "",
 								       "style": "IPY_MODEL_ea73cf00ea304b83bb5ba6616ea9a1f7",
 								       "tabbable": null,
 								       "tooltip": null,
 								       "value": "100%"
 								      }
 								     },
 								     "ea73cf00ea304b83bb5ba6616ea9a1f7": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HTMLStyleModel",
 								      "state": {
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HTMLStyleModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/base",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "StyleView",
 								       "background": null,
 								       "description_width": "",
 								       "font_size": null,
 								       "text_color": null
 								      }
 								     },
 								     "f0cac133ee7d4d9b91352e1002d0ece0": {
 								      "model_module": "@jupyter-widgets/controls",
 								      "model_module_version": "2.0.0",
 								      "model_name": "HBoxModel",
 								      "state": {
 								       "_dom_classes": [],
 								       "_model_module": "@jupyter-widgets/controls",
 								       "_model_module_version": "2.0.0",
 								       "_model_name": "HBoxModel",
 								       "_view_count": null,
 								       "_view_module": "@jupyter-widgets/controls",
 								       "_view_module_version": "2.0.0",
 								       "_view_name": "HBoxView",
 								       "box_style": "",
 								       "children": [
 								        "IPY_MODEL_c0f5bae3823f4acda0d8d97581c91028",
 								        "IPY_MODEL_1e573b802f50427686c094419afde154",
 								        "IPY_MODEL_bcceb81a497643a1a9bd2e5f34ebdf87"
 								       ],
 								       "layout": "IPY_MODEL_98a6cebd3c7f4e4da00fbe350fab7ece",
 								       "tabbable": null,
 								       "tooltip": null
 								      }
 								     }
 								    },
 								    "version_major": 2,
 								    "version_minor": 0
 								   }
 								  }
 								 },
 								 "nbformat": 4,
 								 "nbformat_minor": 2
 								}