{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved. \n", "\n", "Licensed under the MIT License.\n", "\n", "# Use FLAML to Tune ChatGPT\n", "\n", "In this notebook, we tune OpenAI ChatGPT model for math problem solving. We use [the MATH benchmark](https://crfm.stanford.edu/helm/latest/?group=math_chain_of_thought) for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning. \n", "\n", "## Requirements\n", "\n", "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n", "```bash\n", "pip install flaml[openai]==1.2.0\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:52.317406Z", "iopub.status.busy": "2023-02-13T23:40:52.316561Z", "iopub.status.idle": "2023-02-13T23:40:52.321193Z", "shell.execute_reply": "2023-02-13T23:40:52.320628Z" } }, "outputs": [], "source": [ "# %pip install flaml[openai]==1.2.0 datasets" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Set your OpenAI key:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:52.324240Z", "iopub.status.busy": "2023-02-13T23:40:52.323783Z", "iopub.status.idle": "2023-02-13T23:40:52.330570Z", "shell.execute_reply": "2023-02-13T23:40:52.329750Z" } }, "outputs": [], "source": [ "import os\n", "\n", "if \"OPENAI_API_KEY\" not in os.environ:\n", " os.environ[\"OPENAI_API_KEY\"] = \"\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "When ChatGPT is available in Azure OpenAI, uncomment the following to use Azure OpenAI:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:52.333547Z", "iopub.status.busy": "2023-02-13T23:40:52.333249Z", "iopub.status.idle": "2023-02-13T23:40:52.336508Z", "shell.execute_reply": "2023-02-13T23:40:52.335858Z" } }, "outputs": [], "source": [ "# openai.api_type = \"azure\"\n", "# openai.api_base = \"https://.openai.azure.com/\"\n", "# openai.api_version = \"2023-3-01\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Load dataset\n", "\n", "First, we load the competition_math dataset. The dataset contains 457 \"Level 1\" examples. We use a random sample of 20 examples for tuning the generation hyperparameters and the remaining for evaluation. We use one demonstration example in the prompt." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:52.339977Z", "iopub.status.busy": "2023-02-13T23:40:52.339556Z", "iopub.status.idle": "2023-02-13T23:40:54.603349Z", "shell.execute_reply": "2023-02-13T23:40:54.602630Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Found cached dataset competition_math (/home/vscode/.cache/huggingface/datasets/competition_math/default/1.0.0/2a2a2995c2847186883ecd64f69be7d602b8a6f6b51950624d4dc2263f93333b)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "79ced88ccf474030bda228436813e94b", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/2 [00:00 Optional[str]:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Extract the text within a \\\\boxed{...} environment.\n", " Example:\n", " >>> remove_boxed(\\\\boxed{\\\\frac{2}{3}})\n", " \\\\frac{2}{3}\n", " \"\"\"\n", " left = \"\\\\boxed{\"\n", " try:\n", " assert string[: len(left)] == left\n", " assert string[-1] == \"}\"\n", " return string[len(left) : -1]\n", " except Exception:\n", " return None\n", "\n", "\n", "def last_boxed_only_string(string: str) -> Optional[str]:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Extract the last \\\\boxed{...} or \\\\fbox{...} element from a string.\n", " \"\"\"\n", " idx = string.rfind(\"\\\\boxed\")\n", " if idx < 0:\n", " idx = string.rfind(\"\\\\fbox\")\n", " if idx < 0:\n", " return None\n", "\n", " i = idx\n", " right_brace_idx = None\n", " num_left_braces_open = 0\n", " while i < len(string):\n", " if string[i] == \"{\":\n", " num_left_braces_open += 1\n", " if string[i] == \"}\":\n", " num_left_braces_open -= 1\n", " if num_left_braces_open == 0:\n", " right_brace_idx = i\n", " break\n", " i += 1\n", "\n", " if right_brace_idx is None:\n", " retval = None\n", " else:\n", " retval = string[idx : right_brace_idx + 1]\n", "\n", " return retval\n", "\n", "\n", "def _fix_fracs(string: str) -> str:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Reformat fractions.\n", " Examples:\n", " >>> _fix_fracs(\"\\\\frac1b\")\n", " \\frac{1}{b}\n", " >>> _fix_fracs(\"\\\\frac12\")\n", " \\frac{1}{2}\n", " >>> _fix_fracs(\"\\\\frac1{72}\")\n", " \\frac{1}{72}\n", " \"\"\"\n", " substrs = string.split(\"\\\\frac\")\n", " new_str = substrs[0]\n", " if len(substrs) > 1:\n", " substrs = substrs[1:]\n", " for substr in substrs:\n", " new_str += \"\\\\frac\"\n", " if substr[0] == \"{\":\n", " new_str += substr\n", " else:\n", " try:\n", " assert len(substr) >= 2\n", " except Exception:\n", " return string\n", " a = substr[0]\n", " b = substr[1]\n", " if b != \"{\":\n", " if len(substr) > 2:\n", " post_substr = substr[2:]\n", " new_str += \"{\" + a + \"}{\" + b + \"}\" + post_substr\n", " else:\n", " new_str += \"{\" + a + \"}{\" + b + \"}\"\n", " else:\n", " if len(substr) > 2:\n", " post_substr = substr[2:]\n", " new_str += \"{\" + a + \"}\" + b + post_substr\n", " else:\n", " new_str += \"{\" + a + \"}\" + b\n", " string = new_str\n", " return string\n", "\n", "\n", "def _fix_a_slash_b(string: str) -> str:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Reformat fractions formatted as a/b to \\\\frac{a}{b}.\n", " Example:\n", " >>> _fix_a_slash_b(\"2/3\")\n", " \\frac{2}{3}\n", " \"\"\"\n", " if len(string.split(\"/\")) != 2:\n", " return string\n", " a_str = string.split(\"/\")[0]\n", " b_str = string.split(\"/\")[1]\n", " try:\n", " a = int(a_str)\n", " b = int(b_str)\n", " assert string == \"{}/{}\".format(a, b)\n", " new_string = \"\\\\frac{\" + str(a) + \"}{\" + str(b) + \"}\"\n", " return new_string\n", " except Exception:\n", " return string\n", "\n", "\n", "def _remove_right_units(string: str) -> str:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Remove units (on the right).\n", " \"\\\\text{ \" only ever occurs (at least in the val set) when describing units.\n", " \"\"\"\n", " if \"\\\\text{ \" in string:\n", " splits = string.split(\"\\\\text{ \")\n", " assert len(splits) == 2\n", " return splits[0]\n", " else:\n", " return string\n", "\n", "\n", "def _fix_sqrt(string: str) -> str:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Reformat square roots.\n", " Example:\n", " >>> _fix_sqrt(\"\\\\sqrt3\")\n", " \\sqrt{3}\n", " \"\"\"\n", " if \"\\\\sqrt\" not in string:\n", " return string\n", " splits = string.split(\"\\\\sqrt\")\n", " new_string = splits[0]\n", " for split in splits[1:]:\n", " if split[0] != \"{\":\n", " a = split[0]\n", " new_substr = \"\\\\sqrt{\" + a + \"}\" + split[1:]\n", " else:\n", " new_substr = \"\\\\sqrt\" + split\n", " new_string += new_substr\n", " return new_string\n", "\n", "\n", "def _strip_string(string: str) -> str:\n", " \"\"\"Source: https://github.com/hendrycks/math\n", " Apply the reformatting helper functions above.\n", " \"\"\"\n", " # linebreaks\n", " string = string.replace(\"\\n\", \"\")\n", " # print(string)\n", "\n", " # remove inverse spaces\n", " string = string.replace(\"\\\\!\", \"\")\n", " # print(string)\n", "\n", " # replace \\\\ with \\\n", " string = string.replace(\"\\\\\\\\\", \"\\\\\")\n", " # print(string)\n", "\n", " # replace tfrac and dfrac with frac\n", " string = string.replace(\"tfrac\", \"frac\")\n", " string = string.replace(\"dfrac\", \"frac\")\n", " # print(string)\n", "\n", " # remove \\left and \\right\n", " string = string.replace(\"\\\\left\", \"\")\n", " string = string.replace(\"\\\\right\", \"\")\n", " # print(string)\n", "\n", " # Remove circ (degrees)\n", " string = string.replace(\"^{\\\\circ}\", \"\")\n", " string = string.replace(\"^\\\\circ\", \"\")\n", "\n", " # remove dollar signs\n", " string = string.replace(\"\\\\$\", \"\")\n", "\n", " # remove units (on the right)\n", " string = _remove_right_units(string)\n", "\n", " # remove percentage\n", " string = string.replace(\"\\\\%\", \"\")\n", " string = string.replace(\"\\%\", \"\")\n", "\n", " # \" 0.\" equivalent to \" .\" and \"{0.\" equivalent to \"{.\" Alternatively, add \"0\" if \".\" is the start of the string\n", " string = string.replace(\" .\", \" 0.\")\n", " string = string.replace(\"{.\", \"{0.\")\n", " # if empty, return empty string\n", " if len(string) == 0:\n", " return string\n", " if string[0] == \".\":\n", " string = \"0\" + string\n", "\n", " # to consider: get rid of e.g. \"k = \" or \"q = \" at beginning\n", " if len(string.split(\"=\")) == 2:\n", " if len(string.split(\"=\")[0]) <= 2:\n", " string = string.split(\"=\")[1]\n", "\n", " # fix sqrt3 --> sqrt{3}\n", " string = _fix_sqrt(string)\n", "\n", " # remove spaces\n", " string = string.replace(\" \", \"\")\n", "\n", " # \\frac1b or \\frac12 --> \\frac{1}{b} and \\frac{1}{2}, etc.\n", " # Even works with \\frac1{72} (but not \\frac{72}1).\n", " # Also does a/b --> \\\\frac{a}{b}\n", " string = _fix_fracs(string)\n", "\n", " # manually change 0.5 --> \\frac{1}{2}\n", " if string == \"0.5\":\n", " string = \"\\\\frac{1}{2}\"\n", "\n", " # NOTE: X/Y changed to \\frac{X}{Y} in dataset, but in simple cases fix in case the model output is X/Y\n", " string = _fix_a_slash_b(string)\n", "\n", " return string\n", "\n", "\n", "def get_answer(solution: Optional[str]) -> Optional[str]:\n", " if solution is None:\n", " return None\n", " last_boxed = last_boxed_only_string(solution)\n", " if last_boxed is None:\n", " return None\n", " answer = remove_boxed(last_boxed)\n", " if answer is None:\n", " return None\n", " return answer\n", "\n", "\n", "def is_equiv(str1: Optional[str], str2: Optional[str]) -> float:\n", " \"\"\"Returns (as a float) whether two strings containing math are equivalent up to differences of formatting in\n", " - units\n", " - fractions\n", " - square roots\n", " - superfluous LaTeX.\n", " Source: https://github.com/hendrycks/math\n", " \"\"\"\n", " if str1 is None and str2 is None:\n", " print(\"WARNING: Both None\")\n", " return 1.0\n", " if str1 is None or str2 is None:\n", " return 0.0\n", "\n", " try:\n", " ss1 = _strip_string(str1)\n", " ss2 = _strip_string(str2)\n", " return float(ss1 == ss2)\n", " except Exception:\n", " return float(str1 == str2)\n", "\n", "\n", "def is_equiv_chain_of_thought(str1: str, str2: str) -> float:\n", " \"\"\"Strips the solution first before calling `is_equiv`.\"\"\"\n", " ans1 = get_answer(str1)\n", " ans2 = get_answer(str2)\n", "\n", " return is_equiv(ans1, ans2)\n", "\n", "\n", "def success_metrics(responses, solution, **args):\n", " \"\"\"Check if each response is correct.\n", " \n", " Args:\n", " responses (list): The list of responses.\n", " solution (str): The canonical solution.\n", " \n", " Returns:\n", " dict: The success metrics.\n", " \"\"\"\n", " success_list = []\n", " n = len(responses)\n", " for i in range(n):\n", " response = responses[i]\n", " succeed = is_equiv_chain_of_thought(response, solution)\n", " success_list.append(succeed)\n", " return {\n", " \"expected_success\": 1 - pow(1 - sum(success_list) / n, n),\n", " \"success\": any(s for s in success_list),\n", " }\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Use the tuning data to find a good configuration\n", "\n", "### Import the oai and tune subpackages from flaml.\n", "\n", "FLAML has provided an API for hyperparameter optimization of OpenAI ChatGPT models: `oai.ChatCompletion.tune` and to make a request with the tuned config: `oai.ChatCompletion.create`. First, we import oai from flaml:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:54.634335Z", "iopub.status.busy": "2023-02-13T23:40:54.633929Z", "iopub.status.idle": "2023-02-13T23:40:56.105700Z", "shell.execute_reply": "2023-02-13T23:40:56.105085Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from flaml import oai" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "For (local) reproducibility and cost efficiency, we cache responses from OpenAI." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:56.109177Z", "iopub.status.busy": "2023-02-13T23:40:56.108624Z", "iopub.status.idle": "2023-02-13T23:40:56.112651Z", "shell.execute_reply": "2023-02-13T23:40:56.112076Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "oai.ChatCompletion.set_cache(seed)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This will create a disk cache in \".cache/{seed}\". You can change `cache_path` in `set_cache()`. The cache for different seeds are stored separately.\n", "\n", "### Perform tuning\n", "\n", "The tuning will take a while to finish, depending on the optimization budget. The tuning will be performed under the specified optimization budgets.\n", "\n", "* `inference_budget` is the target average inference budget per instance in the benchmark. For example, 0.002 means the target inference budget is 0.002 dollars, which translates to 1000 tokens (input + output combined) if the gpt-3.5-turbo model is used.\n", "* `optimization_budget` is the total budget allowed to perform the tuning. For example, 0.5 means 0.5 dollars are allowed in total, which translates to 250K tokens for the gpt-3.5-turbo model.\n", "* `num_sumples` is the number of different hyperparameter configurations which is allowed to try. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided by `optimization_budget`.\n", "\n", "Users can specify tuning data, optimization metric, optimization mode, evaluation function, search spaces etc.. The default search space is:\n", "\n", "```python\n", "price1K = {\n", " \"gpt-3.5-turbo\": 0.002,\n", "}\n", "\n", "default_search_space = {\n", " \"model\": tune.choice(list(price1K.keys())),\n", " \"temperature_or_top_p\": tune.choice(\n", " [\n", " {\"temperature\": tune.uniform(0, 1)},\n", " {\"top_p\": tune.uniform(0, 1)},\n", " ]\n", " ),\n", " \"max_tokens\": tune.lograndint(50, 1000),\n", " \"n\": tune.randint(1, 100),\n", " \"prompt\": \"{prompt}\",\n", "}\n", "```\n", "\n", "The default search space can be overriden by users' input.\n", "For example, the following code specifies a fixed prompt template and a list of stop sequences. For hyperparameters which don't appear in users' input, the default search space will be used." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:56.115383Z", "iopub.status.busy": "2023-02-13T23:40:56.114975Z", "iopub.status.idle": "2023-02-13T23:41:55.045654Z", "shell.execute_reply": "2023-02-13T23:41:55.044973Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m[I 2023-03-05 05:01:24,381]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 1 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.36280922847807595}, 'max_tokens': 347, 'n': 10, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.011049999999999999, 'cost': 0.011049999999999999, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.36280922847807595}, 'max_tokens': 347, 'n': 10, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.36280922847807595}, 'config/max_tokens': 347, 'config/n': 10, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0027980804443359375}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 2 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.6336482349262754}, 'max_tokens': 470, 'n': 50, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.6336482349262754}, 'max_tokens': 470, 'n': 50, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.6336482349262754}, 'config/max_tokens': 470, 'config/n': 50, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0004801750183105469}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 3 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7605307121989587}, 'max_tokens': 82, 'n': 9, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'expected_success': 0.5308234838865221, 'success': 0.6, 'total_cost': 0.043492, 'cost': 0.032442, 'inference_cost': 0.0016220999999999998, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7605307121989587}, 'max_tokens': 82, 'n': 9, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.7605307121989587}, 'config/max_tokens': 82, 'config/n': 9, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0066220760345458984}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 4 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.003948266327914451}, 'max_tokens': 231, 'n': 81, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.049, 'cost': 0.005508, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.003948266327914451}, 'max_tokens': 231, 'n': 81, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.003948266327914451}, 'config/max_tokens': 231, 'config/n': 81, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0020475387573242188}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 5 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.29187606817063316}, 'max_tokens': 781, 'n': 71, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.29187606817063316}, 'max_tokens': 781, 'n': 71, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.29187606817063316}, 'config/max_tokens': 781, 'config/n': 71, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0005230903625488281}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 6 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3733407600514692}, 'max_tokens': 375, 'n': 44, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3733407600514692}, 'max_tokens': 375, 'n': 44, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.3733407600514692}, 'config/max_tokens': 375, 'config/n': 44, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.000446319580078125}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 7 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.5131382425543909}, 'max_tokens': 350, 'n': 60, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.5131382425543909}, 'max_tokens': 350, 'n': 60, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.5131382425543909}, 'config/max_tokens': 350, 'config/n': 60, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.00055694580078125}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 8 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.9086488808086682}, 'max_tokens': 129, 'n': 9, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.08172600000000001, 'cost': 0.032726000000000005, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.9086488808086682}, 'max_tokens': 129, 'n': 9, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.9086488808086682}, 'config/max_tokens': 129, 'config/n': 9, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.004898548126220703}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 9 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.09077800000000001, 'cost': 0.009052000000000001, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.8286813263076767}, 'config/max_tokens': 57, 'config/n': 63, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0021355152130126953}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 10 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.1989475396788123}, 'max_tokens': 650, 'n': 35, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.1989475396788123}, 'max_tokens': 650, 'n': 35, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.1989475396788123}, 'config/max_tokens': 650, 'config/n': 35, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0006568431854248047}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 11 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.8839364795611863}, 'max_tokens': 132, 'n': 17, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.09582600000000001, 'cost': 0.005048, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.8839364795611863}, 'max_tokens': 132, 'n': 17, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.8839364795611863}, 'config/max_tokens': 132, 'config/n': 17, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.009762048721313477}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 12 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.8211056578369285}, 'max_tokens': 78, 'n': 39, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:24] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.8211056578369285}, 'max_tokens': 78, 'n': 39, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.8211056578369285}, 'config/max_tokens': 78, 'config/n': 39, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.007121086120605469}\n", "[flaml.tune.tune: 03-05 05:01:24] {811} INFO - trial 13 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.0422875090290305}, 'max_tokens': 56, 'n': 3, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:35] {215} INFO - result: {'expected_success': 0.15, 'success': 0.15, 'total_cost': 0.10778599999999998, 'cost': 0.011960000000000002, 'inference_cost': 0.000598, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.0422875090290305}, 'max_tokens': 56, 'n': 3, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.0422875090290305}, 'config/max_tokens': 56, 'config/n': 3, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 10.761135816574097}\n", "[flaml.tune.tune: 03-05 05:01:35] {811} INFO - trial 14 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.11030610637969397}, 'max_tokens': 52, 'n': 3, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:52] {215} INFO - result: {'expected_success': 0.1, 'success': 0.1, 'total_cost': 0.11931399999999996, 'cost': 0.011528, 'inference_cost': 0.0005764, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.11030610637969397}, 'max_tokens': 52, 'n': 3, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.11030610637969397}, 'config/max_tokens': 52, 'config/n': 3, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 17.322299242019653}\n", "[flaml.tune.tune: 03-05 05:01:52] {811} INFO - trial 15 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5632321190691856}, 'max_tokens': 89, 'n': 22, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:52] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5632321190691856}, 'max_tokens': 89, 'n': 22, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.5632321190691856}, 'config/max_tokens': 89, 'config/n': 22, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0008306503295898438}\n", "[flaml.tune.tune: 03-05 05:01:52] {811} INFO - trial 16 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.04561271084264061}, 'max_tokens': 51, 'n': 98, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:01:54] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.12412799999999996, 'cost': 0.004814, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.04561271084264061}, 'max_tokens': 51, 'n': 98, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.04561271084264061}, 'config/max_tokens': 51, 'config/n': 98, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 1.575875997543335}\n", "[flaml.tune.tune: 03-05 05:01:54] {811} INFO - trial 17 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5087240651577944}, 'max_tokens': 95, 'n': 1, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:02:20] {215} INFO - result: {'expected_success': 0.3, 'success': 0.3, 'total_cost': 0.13279399999999997, 'cost': 0.008666, 'inference_cost': 0.0004333, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5087240651577944}, 'max_tokens': 95, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.5087240651577944}, 'config/max_tokens': 95, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 26.14193034172058}\n", "[flaml.tune.tune: 03-05 05:02:20] {811} INFO - trial 18 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.6040740802039921}, 'max_tokens': 129, 'n': 25, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:02:20] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.6040740802039921}, 'max_tokens': 129, 'n': 25, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.6040740802039921}, 'config/max_tokens': 129, 'config/n': 25, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0008137226104736328}\n", "[flaml.tune.tune: 03-05 05:02:20] {811} INFO - trial 19 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3754115138138923}, 'max_tokens': 86, 'n': 12, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:02:33] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.149274, 'cost': 0.01648, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3754115138138923}, 'max_tokens': 86, 'n': 12, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.3754115138138923}, 'config/max_tokens': 86, 'config/n': 12, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 13.519219398498535}\n", "[flaml.tune.tune: 03-05 05:02:33] {811} INFO - trial 20 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.6887263877538047}, 'max_tokens': 173, 'n': 28, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:02:33] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.6887263877538047}, 'max_tokens': 173, 'n': 28, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.6887263877538047}, 'config/max_tokens': 173, 'config/n': 28, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0005598068237304688}\n", "[flaml.tune.tune: 03-05 05:02:33] {811} INFO - trial 21 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.40706161658517775}, 'max_tokens': 217, 'n': 5, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:03:20] {215} INFO - result: {'expected_success': 0.739152, 'success': 0.8, 'total_cost': 0.17876000000000006, 'cost': 0.029486000000000002, 'inference_cost': 0.0014743, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.40706161658517775}, 'max_tokens': 217, 'n': 5, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.40706161658517775}, 'config/max_tokens': 217, 'config/n': 5, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 47.16692495346069}\n", "[flaml.tune.tune: 03-05 05:03:20] {811} INFO - trial 22 config: {'model': 'gpt-3.5-turbo', 'max_tokens': 174, 'n': 2, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'temperature': 0.27048488009754645}}\n", "[flaml.tune.tune: 03-05 05:04:01] {215} INFO - result: {'expected_success': 0.5125, 'success': 0.55, 'total_cost': 0.19355200000000006, 'cost': 0.014792000000000003, 'inference_cost': 0.0007396000000000001, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'max_tokens': 174, 'n': 2, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'temperature': 0.27048488009754645}}, 'config/model': 'gpt-3.5-turbo', 'config/max_tokens': 174, 'config/n': 2, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'temperature': 0.27048488009754645}, 'experiment_tag': 'exp', 'time_total_s': 40.51927351951599}\n", "[flaml.tune.tune: 03-05 05:04:01] {811} INFO - trial 23 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3413175996734835}, 'max_tokens': 275, 'n': 52, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:04:01] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3413175996734835}, 'max_tokens': 275, 'n': 52, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.3413175996734835}, 'config/max_tokens': 275, 'config/n': 52, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0007867813110351562}\n", "[flaml.tune.tune: 03-05 05:04:01] {811} INFO - trial 24 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.2645495244555195}, 'max_tokens': 499, 'n': 12, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:04:01] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.2645495244555195}, 'max_tokens': 499, 'n': 12, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.2645495244555195}, 'config/max_tokens': 499, 'config/n': 12, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0006549358367919922}\n", "[flaml.tune.tune: 03-05 05:04:01] {811} INFO - trial 25 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.48492162197022287}, 'max_tokens': 174, 'n': 2, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:04:40] {215} INFO - result: {'expected_success': 0.55, 'success': 0.6, 'total_cost': 0.2079620000000001, 'cost': 0.01441, 'inference_cost': 0.0007205, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.48492162197022287}, 'max_tokens': 174, 'n': 2, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.48492162197022287}, 'config/max_tokens': 174, 'config/n': 2, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 38.88523626327515}\n", "[flaml.tune.tune: 03-05 05:04:40] {811} INFO - trial 26 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7008948011018361}, 'max_tokens': 188, 'n': 2, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:05:20] {215} INFO - result: {'expected_success': 0.6375, 'success': 0.65, 'total_cost': 0.22241600000000009, 'cost': 0.014454, 'inference_cost': 0.0007227000000000001, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7008948011018361}, 'max_tokens': 188, 'n': 2, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.7008948011018361}, 'config/max_tokens': 188, 'config/n': 2, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 40.07520294189453}\n", "[flaml.tune.tune: 03-05 05:05:20] {811} INFO - trial 27 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.45563880608336627}, 'max_tokens': 181, 'n': 1, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:05:54] {215} INFO - result: {'expected_success': 0.55, 'success': 0.55, 'total_cost': 0.23225200000000013, 'cost': 0.009836000000000001, 'inference_cost': 0.0004918, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.45563880608336627}, 'max_tokens': 181, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.45563880608336627}, 'config/max_tokens': 181, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 34.365720987319946}\n", "[flaml.tune.tune: 03-05 05:05:54] {811} INFO - trial 28 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.21155867162942757}, 'max_tokens': 183, 'n': 17, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:05:57] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.23748400000000014, 'cost': 0.005232, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.21155867162942757}, 'max_tokens': 183, 'n': 17, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.21155867162942757}, 'config/max_tokens': 183, 'config/n': 17, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 2.9915997982025146}\n", "[flaml.tune.tune: 03-05 05:05:57] {811} INFO - trial 29 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.652909170066013}, 'max_tokens': 285, 'n': 31, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:05:57] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.652909170066013}, 'max_tokens': 285, 'n': 31, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.652909170066013}, 'config/max_tokens': 285, 'config/n': 31, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0005283355712890625}\n", "[flaml.tune.tune: 03-05 05:05:57] {811} INFO - trial 30 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.9990495004030453}, 'max_tokens': 219, 'n': 18, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:06:02] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.24319000000000013, 'cost': 0.005706, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.9990495004030453}, 'max_tokens': 219, 'n': 18, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.9990495004030453}, 'config/max_tokens': 219, 'config/n': 18, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 5.099469184875488}\n", "[flaml.tune.tune: 03-05 05:06:02] {811} INFO - trial 31 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4467837016610728}, 'max_tokens': 404, 'n': 1, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:06:50] {215} INFO - result: {'expected_success': 0.6, 'success': 0.6, 'total_cost': 0.25467800000000024, 'cost': 0.011488, 'inference_cost': 0.0005744, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4467837016610728}, 'max_tokens': 404, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.4467837016610728}, 'config/max_tokens': 404, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 47.18360900878906}\n", "[flaml.tune.tune: 03-05 05:06:50] {811} INFO - trial 32 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7150017857658078}, 'max_tokens': 469, 'n': 9, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:06:50] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7150017857658078}, 'max_tokens': 469, 'n': 9, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.7150017857658078}, 'config/max_tokens': 469, 'config/n': 9, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.000614166259765625}\n", "[flaml.tune.tune: 03-05 05:06:50] {811} INFO - trial 33 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.2594708806296415}, 'max_tokens': 352, 'n': 7, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:07:35] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.29123200000000016, 'cost': 0.036554, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.2594708806296415}, 'max_tokens': 352, 'n': 7, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.2594708806296415}, 'config/max_tokens': 352, 'config/n': 7, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 45.43464660644531}\n", "[flaml.tune.tune: 03-05 05:07:35] {811} INFO - trial 34 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5691158455115929}, 'max_tokens': 520, 'n': 22, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:07:35] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5691158455115929}, 'max_tokens': 520, 'n': 22, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.5691158455115929}, 'config/max_tokens': 520, 'config/n': 22, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0005013942718505859}\n", "[flaml.tune.tune: 03-05 05:07:35] {811} INFO - trial 35 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4357505186889488}, 'max_tokens': 153, 'n': 1, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:08:11] {215} INFO - result: {'expected_success': 0.6, 'success': 0.6, 'total_cost': 0.3012180000000001, 'cost': 0.009986, 'inference_cost': 0.0004993, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4357505186889488}, 'max_tokens': 153, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.4357505186889488}, 'config/max_tokens': 153, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 36.294803857803345}\n", "[flaml.tune.tune: 03-05 05:08:11] {811} INFO - trial 36 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.43174456068612144}, 'max_tokens': 244, 'n': 1, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:08:50] {215} INFO - result: {'expected_success': 0.45, 'success': 0.45, 'total_cost': 0.3115360000000001, 'cost': 0.010318, 'inference_cost': 0.0005159, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.43174456068612144}, 'max_tokens': 244, 'n': 1, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.43174456068612144}, 'config/max_tokens': 244, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 38.782007455825806}\n", "[flaml.tune.tune: 03-05 05:08:50] {811} INFO - trial 37 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.31174598735063297}, 'max_tokens': 152, 'n': 93, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:08:50] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.31174598735063297}, 'max_tokens': 152, 'n': 93, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.31174598735063297}, 'config/max_tokens': 152, 'config/n': 93, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.000728607177734375}\n", "[flaml.tune.tune: 03-05 05:08:50] {811} INFO - trial 38 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.9998765149838305}, 'max_tokens': 968, 'n': 13, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:08:50] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.9998765149838305}, 'max_tokens': 968, 'n': 13, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.9998765149838305}, 'config/max_tokens': 968, 'config/n': 13, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0006527900695800781}\n", "[flaml.tune.tune: 03-05 05:08:50] {811} INFO - trial 39 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4077967938262427}, 'max_tokens': 208, 'n': 6, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:09:37] {215} INFO - result: {'expected_success': 0.8148458933470506, 'success': 0.85, 'total_cost': 0.344804, 'cost': 0.03326799999999999, 'inference_cost': 0.0016634000000000002, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4077967938262427}, 'max_tokens': 208, 'n': 6, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.4077967938262427}, 'config/max_tokens': 208, 'config/n': 6, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 46.54340124130249}\n", "[flaml.tune.tune: 03-05 05:09:37] {811} INFO - trial 40 config: {'model': 'gpt-3.5-turbo', 'max_tokens': 340, 'n': 1, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'temperature': 0.4404342494313882}}\n", "[flaml.tune.tune: 03-05 05:10:23] {215} INFO - result: {'expected_success': 0.75, 'success': 0.75, 'total_cost': 0.356122, 'cost': 0.011318000000000002, 'inference_cost': 0.0005658999999999999, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'max_tokens': 340, 'n': 1, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'temperature': 0.4404342494313882}}, 'config/model': 'gpt-3.5-turbo', 'config/max_tokens': 340, 'config/n': 1, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'temperature': 0.4404342494313882}, 'experiment_tag': 'exp', 'time_total_s': 45.89974808692932}\n", "[flaml.tune.tune: 03-05 05:10:23] {811} INFO - trial 41 config: {'model': 'gpt-3.5-turbo', 'max_tokens': 127, 'n': 16, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'temperature': 0.37515933822109715}}\n", "[flaml.tune.tune: 03-05 05:10:26] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.361062, 'cost': 0.00494, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'max_tokens': 127, 'n': 16, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'temperature': 0.37515933822109715}}, 'config/model': 'gpt-3.5-turbo', 'config/max_tokens': 127, 'config/n': 16, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'temperature': 0.37515933822109715}, 'experiment_tag': 'exp', 'time_total_s': 3.5503623485565186}\n", "[flaml.tune.tune: 03-05 05:10:26] {811} INFO - trial 42 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.996156173020253}, 'max_tokens': 107, 'n': 7, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:11:06] {215} INFO - result: {'expected_success': 0.646968646445905, 'success': 0.7, 'total_cost': 0.39229600000000003, 'cost': 0.031234, 'inference_cost': 0.0015617, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.996156173020253}, 'max_tokens': 107, 'n': 7, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.996156173020253}, 'config/max_tokens': 107, 'config/n': 7, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 40.09834337234497}\n", "[flaml.tune.tune: 03-05 05:11:06] {811} INFO - trial 43 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.712309746815617}, 'max_tokens': 112, 'n': 77, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:11:06] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'top_p': 0.712309746815617}, 'max_tokens': 112, 'n': 77, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'top_p': 0.712309746815617}, 'config/max_tokens': 112, 'config/n': 77, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0007219314575195312}\n", "[flaml.tune.tune: 03-05 05:11:06] {811} INFO - trial 44 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7694213309158455}, 'max_tokens': 226, 'n': 8, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:11:55] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.42729200000000006, 'cost': 0.034996, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.7694213309158455}, 'max_tokens': 226, 'n': 8, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.7694213309158455}, 'config/max_tokens': 226, 'config/n': 8, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 48.949331283569336}\n", "[flaml.tune.tune: 03-05 05:11:55] {811} INFO - trial 45 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.9557646172390091}, 'max_tokens': 293, 'n': 45, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:11:55] {215} INFO - result: {'inference_cost': inf, 'expected_success': -inf, 'cost': 0, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.9557646172390091}, 'max_tokens': 293, 'n': 45, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.9557646172390091}, 'config/max_tokens': 293, 'config/n': 45, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0007379055023193359}\n", "[flaml.tune.tune: 03-05 05:11:55] {811} INFO - trial 46 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.9767564075397783}, 'max_tokens': 65, 'n': 16, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:12:03] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.436042, 'cost': 0.008749999999999999, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.9767564075397783}, 'max_tokens': 65, 'n': 16, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.9767564075397783}, 'config/max_tokens': 65, 'config/n': 16, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 8.102897882461548}\n", "[flaml.tune.tune: 03-05 05:12:03] {811} INFO - trial 47 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3783227519390696}, 'max_tokens': 111, 'n': 6, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:12:39] {215} INFO - result: {'expected_success': 0.5908468364197531, 'success': 0.65, 'total_cost': 0.46333, 'cost': 0.027288, 'inference_cost': 0.0013644, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.3783227519390696}, 'max_tokens': 111, 'n': 6, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.3783227519390696}, 'config/max_tokens': 111, 'config/n': 6, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 35.84658098220825}\n", "[flaml.tune.tune: 03-05 05:12:39] {811} INFO - trial 48 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5239740220006481}, 'max_tokens': 150, 'n': 10, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:12:49] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.47180400000000006, 'cost': 0.008474, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.5239740220006481}, 'max_tokens': 150, 'n': 10, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.5239740220006481}, 'config/max_tokens': 150, 'config/n': 10, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 9.35022783279419}\n", "[flaml.tune.tune: 03-05 05:12:49] {811} INFO - trial 49 config: {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4090242730676276}, 'max_tokens': 198, 'n': 6, 'prompt': 0, 'stop': 0}\n", "[flaml.tune.tune: 03-05 05:13:30] {215} INFO - result: {'expected_success': 0, 'total_cost': 0.500916, 'cost': 0.029112000000000002, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4090242730676276}, 'max_tokens': 198, 'n': 6, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.4090242730676276}, 'config/max_tokens': 198, 'config/n': 6, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 40.903329372406006}\n", "[flaml.tune.tune: 03-05 05:13:30] {834} WARNING - fail to sample a trial for 100 times in a row, stopping.\n" ] } ], "source": [ "import logging\n", "\n", "config, analysis = oai.ChatCompletion.tune(\n", " data=tune_data, # the data for tuning\n", " metric=\"expected_success\", # the metric to optimize\n", " mode=\"max\", # the optimization mode\n", " eval_func=success_metrics, # the evaluation function to return the success metrics\n", " # log_file_name=\"logs/math.log\", # the log file name\n", " inference_budget=0.002, # the inference budget (dollar)\n", " optimization_budget=0.5, # the optimization budget (dollar)\n", " # num_samples can further limit the number of trials for different hyperparameter configurations;\n", " # -1 means decided by the optimization budget only\n", " num_samples=-1,\n", " prompt=prompts, # the prompt templates to choose from\n", " stop=\"###\", # the stop sequence\n", " logging_level=logging.INFO, # the logging level\n", ")\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Output tuning results\n", "\n", "After the tuning, we can print out the config and the result found by FLAML:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:41:55.049204Z", "iopub.status.busy": "2023-02-13T23:41:55.048871Z", "iopub.status.idle": "2023-02-13T23:41:55.053284Z", "shell.execute_reply": "2023-02-13T23:41:55.052574Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "optimized config {'model': 'gpt-3.5-turbo', 'max_tokens': 208, 'n': 6, 'prompt': at 0x7f80e405b430>, 'stop': '###', 'temperature': 0.4077967938262427}\n", "best result on tuning data {'expected_success': 0.8148458933470506, 'success': 0.85, 'total_cost': 0.344804, 'cost': 0.03326799999999999, 'inference_cost': 0.0016634000000000002, 'training_iteration': 0, 'config': {'model': 'gpt-3.5-turbo', 'temperature_or_top_p': {'temperature': 0.4077967938262427}, 'max_tokens': 208, 'n': 6, 'prompt': 0, 'stop': 0}, 'config/model': 'gpt-3.5-turbo', 'config/temperature_or_top_p': {'temperature': 0.4077967938262427}, 'config/max_tokens': 208, 'config/n': 6, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 46.54340124130249}\n" ] } ], "source": [ "print(\"optimized config\", config)\n", "print(\"best result on tuning data\", analysis.best_result)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Make a request with the tuned config\n", "\n", "We can apply the tuned config on the request for an example task:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:41:55.056205Z", "iopub.status.busy": "2023-02-13T23:41:55.055631Z", "iopub.status.idle": "2023-02-13T23:41:56.039259Z", "shell.execute_reply": "2023-02-13T23:41:56.038427Z" }, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"choices\": [\n", " {\n", " \"finish_reason\": \"stop\",\n", " \"index\": 0,\n", " \"message\": {\n", " \"content\": \"\\n\\nAnswer: Using the logarithmic identity $\\\\log_{a}(b\\\\cdot c)=\\\\log_{a}(b)+\\\\log_{a}(c)$, we can simplify the expression as follows: $$\\\\log_{10} 40 +\\\\log_{10} 25=\\\\log_{10}(40\\\\cdot 25)=\\\\log_{10}(1000)=\\\\boxed{3}.$$\",\n", " \"role\": \"assistant\"\n", " }\n", " },\n", " {\n", " \"finish_reason\": null,\n", " \"index\": 1,\n", " \"message\": {\n", " \"content\": \"\\n\\nAnswer: Using the logarithmic property $\\\\log_a b + \\\\log_a c = \\\\log_a (bc)$, we can combine the two logarithms to get $\\\\log_{10} 40 \\\\cdot 25$. Simplifying, we get $\\\\log_{10} 1000$. Since $10^3 = 1000$, we have $\\\\log_{10} 1000 = \\\\boxed{3}$.\",\n", " \"role\": \"assistant\"\n", " }\n", " },\n", " {\n", " \"finish_reason\": \"stop\",\n", " \"index\": 2,\n", " \"message\": {\n", " \"content\": \"\\n\\nAnswer: Using the logarithmic property $\\\\log_a b + \\\\log_a c = \\\\log_a (bc)$, we can simplify the expression as follows: $$\\\\log_{10} 40 + \\\\log_{10} 25 = \\\\log_{10} (40 \\\\cdot 25) = \\\\log_{10} 1000$$ Since $1000$ is equal to $10^3$, we have $\\\\log_{10} 1000 = \\\\boxed{3}$.\",\n", " \"role\": \"assistant\"\n", " }\n", " },\n", " {\n", " \"finish_reason\": \"stop\",\n", " \"index\": 3,\n", " \"message\": {\n", " \"content\": \"\\n\\nAnswer: Using the logarithmic identity $\\\\log_{a}(b\\\\cdot c) = \\\\log_{a}(b) + \\\\log_{a}(c)$, we can simplify the expression as follows:\\n\\n$$\\\\log_{10} 40 +\\\\log_{10} 25 = \\\\log_{10} (40\\\\cdot 25) = \\\\log_{10} 1000$$\\n\\nSince $1000 = 10^3$, we have $\\\\log_{10} 1000 = \\\\boxed{3}$.\",\n", " \"role\": \"assistant\"\n", " }\n", " },\n", " {\n", " \"finish_reason\": \"stop\",\n", " \"index\": 4,\n", " \"message\": {\n", " \"content\": \"\\n\\nAnswer: Using the logarithmic property $\\\\log_{a}(b) + \\\\log_{a}(c) = \\\\log_{a}(bc)$, we can simplify the expression to $\\\\log_{10}(40 \\\\cdot 25)$. Multiplying $40$ and $25$ gives us $1000$. Therefore, the expression simplifies to $\\\\log_{10}1000$. Since $10^3=1000$, we have $\\\\log_{10}1000 = \\\\boxed{3}$.\",\n", " \"role\": \"assistant\"\n", " }\n", " },\n", " {\n", " \"finish_reason\": \"stop\",\n", " \"index\": 5,\n", " \"message\": {\n", " \"content\": \"\\n\\nAnswer: Using the logarithmic identity $\\\\log_{a}(b) + \\\\log_{a}(c) = \\\\log_{a}(bc)$, we can simplify the expression to $\\\\log_{10}(40\\\\cdot25)$. Evaluating $40\\\\cdot25$ gives us $1000$, so our final answer is $\\\\log_{10}(1000) = \\\\boxed{3}$.\",\n", " \"role\": \"assistant\"\n", " }\n", " }\n", " ],\n", " \"created\": 1677992931,\n", " \"id\": \"chatcmpl-6qau3onXVENQuWDXUttbTe3rJ27vH\",\n", " \"model\": \"gpt-3.5-turbo-0301\",\n", " \"object\": \"chat.completion\",\n", " \"usage\": {\n", " \"completion_tokens\": 575,\n", " \"prompt_tokens\": 112,\n", " \"total_tokens\": 687\n", " }\n", "}\n", "{'expected_success': 1.0, 'success': True}\n" ] } ], "source": [ "responses = oai.ChatCompletion.create(context=tune_data[1], **config)\n", "print(responses)\n", "print(success_metrics([response[\"message\"][\"content\"].rstrip() for response in responses[\"choices\"]], **tune_data[1]))\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate the success rate on the test data\n", "\n", "You can use flaml's `oai.ChatCompletion.eval` to evaluate the performance of an entire dataset with the tuned config. To do that you need to set `oai.ChatCompletion.data` to the data to evaluate. The following code will take a while to evaluate all the 438 test data instances." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:41:56.042764Z", "iopub.status.busy": "2023-02-13T23:41:56.042086Z", "iopub.status.idle": "2023-02-13T23:53:05.597643Z", "shell.execute_reply": "2023-02-13T23:53:05.596603Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'expected_success': 0.7719714162844925, 'success': 0.8123569794050344, 'total_cost': 1.1100199999999998, 'cost': 0.6091040000000002, 'inference_cost': 0.001393830663615561}\n" ] } ], "source": [ "oai.ChatCompletion.data = test_data\n", "result = oai.ChatCompletion.eval(analysis.best_config, prune=False, eval_only=True)\n", "print(result)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "vscode": { "interpreter": { "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1" } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "2d910cfd2d2a4fc49fc30fbbdc5576a7": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "454146d0f7224f038689031002906e6f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_e4ae2b6f5a974fd4bafb6abb9d12ff26", "IPY_MODEL_577e1e3cc4db4942b0883577b3b52755", "IPY_MODEL_b40bdfb1ac1d4cffb7cefcb870c64d45" ], "layout": "IPY_MODEL_dc83c7bff2f241309537a8119dfc7555", "tabbable": null, "tooltip": null } }, "577e1e3cc4db4942b0883577b3b52755": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_2d910cfd2d2a4fc49fc30fbbdc5576a7", "max": 1, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_74a6ba0c3cbc4051be0a83e152fe1e62", "tabbable": null, "tooltip": null, "value": 1 } }, "6086462a12d54bafa59d3c4566f06cb2": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "74a6ba0c3cbc4051be0a83e152fe1e62": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "7d3f3d9e15894d05a4d188ff4f466554": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } }, "b40bdfb1ac1d4cffb7cefcb870c64d45": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_f1355871cc6f4dd4b50d9df5af20e5c8", "placeholder": "​", "style": "IPY_MODEL_ca245376fd9f4354af6b2befe4af4466", "tabbable": null, "tooltip": null, "value": " 1/1 [00:00<00:00, 44.69it/s]" } }, "ca245376fd9f4354af6b2befe4af4466": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "StyleView", "background": null, "description_width": "", "font_size": null, "text_color": null } }, "dc83c7bff2f241309537a8119dfc7555": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e4ae2b6f5a974fd4bafb6abb9d12ff26": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "2.0.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "2.0.0", "_view_name": "HTMLView", "description": "", "description_allow_html": false, "layout": "IPY_MODEL_6086462a12d54bafa59d3c4566f06cb2", "placeholder": "​", "style": "IPY_MODEL_7d3f3d9e15894d05a4d188ff4f466554", "tabbable": null, "tooltip": null, "value": "100%" } }, "f1355871cc6f4dd4b50d9df5af20e5c8": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "2.0.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "2.0.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border_bottom": null, "border_left": null, "border_right": null, "border_top": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 2 }