"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"# Use FLAML to Tune ChatGPT\n",
"\n",
"FLAML offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Our study finds that tuning hyperparameters can significantly improve the utility of LLMs.\n",
"In this notebook, we tune OpenAI ChatGPT (both GPT-3.5 and GPT-4) models for math problem solving. We use [the MATH benchmark](https://crfm.stanford.edu/helm/latest/?group=math_chain_of_thought) for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning.\n",
"\n",
"Related link: [Blogpost](https://microsoft.github.io/FLAML/blog/2023/04/21/LLM-tuning-math) based on this experiment.\n",
"FLAML has provided an API for hyperparameter optimization of OpenAI ChatGPT models: `autogen.ChatCompletion.tune` and to make a request with the tuned config: `autogen.ChatCompletion.create`. First, we import autogen from flaml:"
"The [`config_list_openai_aoai`](https://microsoft.github.io/FLAML/docs/reference/autogen/oai/openai_utils#config_list_openai_aoai) function tries to create a list of Azure OpenAI endpoints and OpenAI endpoints. It assumes the api keys and api bases are stored in the corresponding environment variables or local txt files:\n",
"\n",
"- OpenAI API key: os.environ[\"OPENAI_API_KEY\"] or `openai_api_key_file=\"key_openai.txt\"`.\n",
"- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n",
"- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n",
" }, # only if the second Azure OpenAI API key is found\n",
"]\n",
"```\n",
"\n",
"You can directly override it if the above function returns an empty list, i.e., it doesn't find the keys in the specified locations."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load dataset\n",
"\n",
"We load the competition_math dataset. The dataset contains 201 \"Level 2\" Algebra examples. We use a random sample of 20 examples for tuning the generation hyperparameters and the remaining for evaluation."
"First we begin by solving the system of equations \\begin{align*}\n",
"3+a&=4-b, \\\\\n",
"4+b&=7+a.\n",
"\\end{align*}Adding the two equations, we get $3+a+4+b=4-b+7+a$, which simplifies to $7+a+b=11+a-b$. Cancelling $a$ from both sides, we get $7+b=11-b$. Solving for $b$, we find that $b=2$. Plugging this into the first equation above, we obtain $3+a=4-2$. Hence $a=-1$ and $3-a=\\boxed{4}$.\n"
"Before we start tuning, we need to define the success metric we want to optimize. For each math task, we use voting to select a response with the most common answers out of all the generated responses. If it has an equivalent answer to the canonical solution, we consider the task as successfully solved. Then we can optimize the mean success rate of a collection of tasks."
"This will create a disk cache in \".cache/{seed}\". You can change `cache_path_root` from \".cache\" to a different path in `set_cache()`. The cache for different seeds are stored separately.\n",
"The tuning will take a while to finish, depending on the optimization budget. The tuning will be performed under the specified optimization budgets.\n",
"\n",
"* `inference_budget` is the target average inference budget per instance in the benchmark. For example, 0.004 means the target inference budget is 0.004 dollars, which translates to 2000 tokens (input + output combined) if the gpt-3.5-turbo model is used.\n",
"* `optimization_budget` is the total budget allowed to perform the tuning. For example, 1 means 1 dollars are allowed in total, which translates to 500K tokens for the gpt-3.5-turbo model.\n",
"* `num_sumples` is the number of different hyperparameter configurations which is allowed to try. The tuning will stop after either num_samples trials or after optimization_budget dollars spent, whichever happens first. -1 means no hard restriction in the number of trials and the actual number is decided by `optimization_budget`.\n",
"\n",
"Users can specify tuning data, optimization metric, optimization mode, evaluation function, search spaces etc.. The default search space is:\n",
"\n",
"```python\n",
"default_search_space = {\n",
" \"model\": tune.choice([\n",
" \"gpt-3.5-turbo\",\n",
" \"gpt-4\",\n",
" ]),\n",
" \"temperature_or_top_p\": tune.choice(\n",
" [\n",
" {\"temperature\": tune.uniform(0, 2)},\n",
" {\"top_p\": tune.uniform(0, 1)},\n",
" ]\n",
" ),\n",
" \"max_tokens\": tune.lograndint(50, 1000),\n",
" \"n\": tune.randint(1, 100),\n",
" \"prompt\": \"{prompt}\",\n",
"}\n",
"```\n",
"\n",
"The default search space can be overridden by users' input.\n",
"For example, the following code specifies a fixed prompt template. For hyperparameters which don't appear in users' input, the default search space will be used."
"[flaml.tune.tune: 08-01 22:42:49] {197} INFO - result: {'expected_success': 0.15989600488062986, 'success': 0.2, 'success_vote': 0.2, 'voted_answer': 'Note that to get from 6075 to 2025 or from 2025 to 675, we must divide by 3. Thus the sequence in question, which ends in an ellipsis or a couple of periods dots indicating more members come next, also begins with the labeling \"700, 300.\" Recall from arithmetic pattern insight to sports like basketball and from looking in our answer choices section', 'votes': 0.7, 'total_cost': 0.13852200000000003, 'cost': 0.031490000000000004, 'inference_cost': 0.0015442499999999998, 'training_iteration': 0, 'config': {'temperature_or_top_p': {'temperature': 1.5210614243979175}, 'max_tokens': 82, 'n': 9, 'prompt': 0, 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True}, 'config/temperature_or_top_p': {'temperature': 1.5210614243979175}, 'config/max_tokens': 82, 'config/n': 9, 'config/prompt': 0, 'config/model': 'gpt-3.5-turbo', 'config/allow_format_str_template': True, 'experiment_tag': 'exp', 'time_total_s': 55.53780817985535}\n",
"[flaml.tune.tune: 08-01 22:46:28] {197} INFO - result: {'expected_success': 0.9818164607828072, 'success': 1.0, 'success_vote': 0.95, 'voted_answer': 'To find the number of integers in the sequence, we need to find when each term becomes less than 1. \\n\\nStarting with 6075, we divide by 3 to get $\\\\frac{6075}{3} = 2025$. Since 2025 is an integer, it is included in the sequence.\\n\\nDividing 2025 by 3, we get $\\\\frac{2025}{3} = 675$. Again, 675 is an integer, so it is included in the sequence.\\n\\nIf we divide 675 by 3, we get $\\\\frac{675}{3} = 225$. 225 is an integer, so it is included in the sequence.\\n\\nDividing 225 by 3, we get $\\\\frac{225}{3} = 75$. 75 is an integer, so it is included in the sequence.\\n\\nDividing 75 by 3, we get $\\\\frac{75}{3} = 25$. 25 is an integer, so it is included in the sequence.\\n\\nIf we divide 25 by 3, we get $\\\\frac{25}{3} \\\\approx 8.3333$, which is not an integer. Thus, 25 is the last integer in the sequence.\\n\\nThere are a total of $\\\\boxed{6}$ integers in the sequence.', 'votes': 34.85, 'total_cost': 0.463802, 'cost': 0.27552199999999993, 'inference_cost': 0.01310685, 'training_iteration': 0, 'config': {'temperature_or_top_p': {'temperature': 0.7466815201029384}, 'max_tokens': 375, 'n': 44, 'prompt': 0, 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True}, 'config/temperature_or_top_p': {'temperature': 0.7466815201029384}, 'config/max_tokens': 375, 'config/n': 44, 'config/prompt': 0, 'config/model': 'gpt-3.5-turbo', 'config/allow_format_str_template': True, 'experiment_tag': 'exp', 'time_total_s': 201.2768588066101}\n",
"[flaml.tune.tune: 08-01 22:48:34] {197} INFO - result: {'expected_success': 0.06535605838853817, 'success': 0.1, 'success_vote': 0.1, 'voted_answer': 'To find out how many integers are in this sequence, we must determine the number of times 3 is a factor being successively divisible cases.\\n\\n\\nFor modern thought:\\nThe ultimate disaster approach ,\\nwill hit eighty year compound,\\ncos thirty pieces, successful trip necessitate; pounds prove evenly\\nHot before four boxes accumulate closely superior statistics prove Yet pale-eyed visionary spite.\\n\\n\\n\\n\\n\\nAnalyzer-based cipher elements yielded intervals This outcome integers.A reason.Brief Inspection Of available objects imply Par near Often Reason via options \\n\\nThe Ratio sum leaves ten; Five.\\n\\nReal Analy access tells not answer right I vary combinations&find divisions Prompt are strongSo inspection Replace Reverse', 'votes': 0.35, 'total_cost': 0.5708920000000002, 'cost': 0.046482, 'inference_cost': 0.00229385, 'training_iteration': 0, 'config': {'temperature_or_top_p': {'temperature': 1.8172977616173365}, 'max_tokens': 129, 'n': 9, 'prompt': 0, 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True}, 'config/temperature_or_top_p': {'temperature': 1.8172977616173365}, 'config/max_tokens': 129, 'config/n': 9, 'config/prompt': 0, 'config/model': 'gpt-3.5-turbo', 'config/allow_format_str_template': True, 'experiment_tag': 'exp', 'time_total_s': 84.15163469314575}\n",
"[flaml.tune.tune: 08-01 22:49:36] {197} INFO - result: {'expected_success': 0.12519255101013854, 'success': 0.15, 'success_vote': 0.15, 'voted_answer': 'Let the original term of the sequence be $x$. There are $796-43= \\\\boxed{753}$ sequences/pro-edits until term the when you divide the sequence becomes less vo/volume of OR counting totals that =prime-number?(-)+lifeisticment real!', 'votes': 1.1, 'total_cost': 0.71616, 'cost': 0.145268, 'inference_cost': 0.007233149999999999, 'training_iteration': 0, 'config': {'temperature_or_top_p': {'temperature': 1.6573626526153533}, 'max_tokens': 57, 'n': 63, 'prompt': 0, 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True}, 'config/temperature_or_top_p': {'temperature': 1.6573626526153533}, 'config/max_tokens': 57, 'config/n': 63, 'config/prompt': 0, 'config/model': 'gpt-3.5-turbo', 'config/allow_format_str_template': True, 'experiment_tag': 'exp', 'time_total_s': 62.10028266906738}\n",
"[flaml.tune.tune: 08-01 22:51:50] {197} INFO - result: {'expected_success': 0.8499999999999934, 'success': 0.85, 'success_vote': 0.85, 'voted_answer': 'We can write the given sequence as $3^4 \\\\cdot 5^2, 3^4 \\\\cdot 5^1, 3^4 \\\\cdot 5^0, \\\\ldots$. We want to find the number of integers in this sequence. \\n\\nNotice that the exponent of 3 stays constant at 4, while the exponent of 5 decreases by 1 each time. We want to find the largest integer $n$ such that $3^4 \\\\cdot 5^n$ is an integer. \\n\\nSince $3^4$ is an integer, we only need to consider the exponent of 5. We want $5^n$ to be an integer, so $n$ must be nonnegative. However, we also want $5^n$ to be a factor of $3^4$, so $n$ must be less than or equal to 4. \\n\\nTherefore, the possible values of $n$ are 0, 1, 2, 3, and 4. There are $\\\\boxed{5}$ integers in the sequence.', 'votes': 33.8, 'total_cost': 0.9523240000000001, 'cost': 0.23616399999999999, 'inference_cost': 0.010300450000000001, 'training_iteration': 0, 'config': {'temperature_or_top_p': {'top_p': 0.1989475396788123}, 'max_tokens': 650, 'n': 35, 'prompt': 0, 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True}, 'config/temperature_or_top_p': {'top_p': 0.1989475396788123}, 'config/max_tokens': 650, 'config/n': 35, 'config/prompt': 0, 'config/model': 'gpt-3.5-turbo', 'config/allow_format_str_template': True, 'experiment_tag': 'exp', 'time_total_s': 134.67861104011536}\n",
"optimized config {'max_tokens': 375, 'n': 44, 'prompt': '{problem} Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\\\boxed{{}}.', 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True, 'temperature': 0.7466815201029384}\n",
"best result on tuning data {'expected_success': 0.9818164607828072, 'success': 1.0, 'success_vote': 0.95, 'voted_answer': 'To find the number of integers in the sequence, we need to find when each term becomes less than 1. \\n\\nStarting with 6075, we divide by 3 to get $\\\\frac{6075}{3} = 2025$. Since 2025 is an integer, it is included in the sequence.\\n\\nDividing 2025 by 3, we get $\\\\frac{2025}{3} = 675$. Again, 675 is an integer, so it is included in the sequence.\\n\\nIf we divide 675 by 3, we get $\\\\frac{675}{3} = 225$. 225 is an integer, so it is included in the sequence.\\n\\nDividing 225 by 3, we get $\\\\frac{225}{3} = 75$. 75 is an integer, so it is included in the sequence.\\n\\nDividing 75 by 3, we get $\\\\frac{75}{3} = 25$. 25 is an integer, so it is included in the sequence.\\n\\nIf we divide 25 by 3, we get $\\\\frac{25}{3} \\\\approx 8.3333$, which is not an integer. Thus, 25 is the last integer in the sequence.\\n\\nThere are a total of $\\\\boxed{6}$ integers in the sequence.', 'votes': 34.85, 'total_cost': 0.463802, 'cost': 0.27552199999999993, 'inference_cost': 0.01310685, 'training_iteration': 0, 'config': {'temperature_or_top_p': {'temperature': 0.7466815201029384}, 'max_tokens': 375, 'n': 44, 'prompt': 0, 'model': 'gpt-3.5-turbo', 'allow_format_str_template': True}, 'config/temperature_or_top_p': {'temperature': 0.7466815201029384}, 'config/max_tokens': 375, 'config/n': 44, 'config/prompt': 0, 'config/model': 'gpt-3.5-turbo', 'config/allow_format_str_template': True, 'experiment_tag': 'exp', 'time_total_s': 201.2768588066101}\n"
" \"content\": \"We start by solving the first equation for $a$: $$3+a=4-b.$$Adding $-3$ to both sides gives $a=1-b$. Substituting this expression for $a$ into the second equation gives $$4+b=7+(1-b).$$Simplifying this expression, we find that $b=2$. Substituting $b=2$ into the first equation to solve for $a$, we find that $a=1-2=-1$. Finally, we have $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"Adding $a$ to both sides of the first equation gives $3+a+a=4-b+a$, which simplifies to $3+2a=4+a-b$. Adding $b$ to both sides of the second equation gives $4+b+b=7+a+b$, which simplifies to $4+2b=7+a+b$. Rearranging the equations gives $2a-b=1$ and $2b-a=3$. Adding these two equations gives $3a=4$, so $a=\\\\frac{4}{3}$. Substituting this into the first equation gives $2\\\\left(\\\\frac{4}{3}\\\\right)-b=1$, so $b=\\\\frac{5}{3}$. Thus, $3-a=3-\\\\frac{4}{3}=\\\\frac{9}{3}-\\\\frac{4}{3}=\\\\boxed{\\\\frac{5}{3}}$.\"\n",
" \"content\": \"We can first rewrite the given equations as $a+b=1$ and $a-b=3$. Adding these equations, we get $(a+b)+(a-b)=1+3$, which simplifies to $2a=4$. Dividing both sides by 2, we find $a=2$. Substituting this value of $a$ into the first equation, we have $2+b=1$, so $b=-1$. Finally, we can compute $3-a=3-2=\\\\boxed{1}$.\"\n",
" \"content\": \"From the first equation, we have $a=4-b-3$. Substituting this into the second equation, we get $4+b=7+4-b-3$. Simplifying the right side gives us $4+b=8-b$. Adding $b$ to both sides gives $4+2b=8$. Subtracting 4 from both sides gives $2b=4$ and dividing both sides by 2 gives $b=2$. Substituting this into the first equation gives $3+a=4-2$, so $a=-1$. Finally, $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"Adding the two equations, we have $(3+a)+(4+b)=(4-b)+(7+a)$. Expanding both sides gives $7+a+b=11+a-b$. Subtracting $a$ and $b$ from both sides gives $7=11$, which is impossible. Thus, there is no solution to the given equations.\"\n",
" \"content\": \"We are given the equations $3+a=4-b$ and $4+b=7+a$. We can solve for $a$ and $b$ by using elimination.\\n\\nWe can eliminate $a$ by adding the two equations together. This gives us $(3+a)+(4+b)=(4-b)+(7+a)$. Simplifying both sides gives us $7+a+b=11+a-b$. We can then eliminate $a$ by subtracting $a$ from both sides, which gives us $7+b=11-b$. Adding $b$ to both sides gives us $7+2b=11$. Subtracting $7$ from both sides gives us $2b=4$. Dividing both sides by $2$, we get $b=2$.\\n\\nNow that we know the value of $b$, we can substitute it back into one of the original equations to solve for $a$. Let's use the equation $4+b=7+a$. Since we found that $b=2$, we have $4+2=7+a$. Simplifying gives us $6=7+a$. Subtracting $7$ from both sides gives us $-1=a$.\\n\\nFinally, we can find $3-a$ by substituting the value of $a$ that we found back into the expression. We have $3-a=3-(-1)$. Simplifying gives us $3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"We can start by solving the given equations separately. From $3+a=4-b$, we can add $b$ to both sides to find $3+a+b=4$. Similarly, from $4+b=7+a$, we can subtract $a$ from both sides to find $4+b-a=7$. Adding these two equations together, we have $3+a+b+4+b-a=4+7$, which simplifies to $7+b=11$. Subtracting $7$ from both sides gives us $b=4$. Substituting this value of $b$ into the equation $4+b-a=7$, we find $4+4-a=7$. Simplifying this equation further, we have $8-a=7$. Subtracting $8$ from both sides gives us $-a=-1$, or $a=1$. Finally, we can substitute the values of $a$ and $b$ into the expression $3-a$, which gives us $3-1=\\\\boxed{2}$.\"\n",
" \"content\": \"Starting with the first equation, we have $3+a=4-b$. Rearranging, we get $a=-1-b$. Substituting this into the second equation, we have $4+b=7+(-1-b)$. Simplifying, we get $b=-2$. Substituting this value of $b$ back into the first equation, we have $3+a=4-(-2)$. Simplifying further, we have $3+a=6$. Subtracting $3$ from both sides, we get $a=3$. Finally, we can find $3-a=3-3=\\\\boxed{0}$.\"\n",
" \"content\": \"Adding the two equations gives $(3+a)+(4+b)=(4-b)+(7+a)$. Simplifying both sides gives $7+a+b=11+a-b$. Subtracting $a$ from both sides gives $7+b=11-b$. Adding $b$ to both sides gives $b+b=11-7$, so $2b=4$ and $b=2$. Substituting this value back into either equation gives $4+2=7+a$, so $a=4$. Therefore, $3-a=3-4=\\\\boxed{-1}$.\"\n",
" \"content\": \"Rearranging the first equation, we have $a=4-b-3$ and rearranging the second equation, we have $b=7+a-4$. Substituting these expressions for $a$ and $b$ into the equation $3-a$, we have $3-(4-b-3)$. Simplifying the expression inside the parentheses, we have $3-(4-b-3)=3-4+b+3=9+b$. Substituting the expression for $b$ into $9+b$, we have $9+(7+a-4)=9+7+a-4=12+a$. Finally, substituting the expression for $a$ into $12+a$, we have $12+(4-b-3)=12+4-b-3=\\\\boxed{10-b}$.\"\n",
" \"content\": \"We have the system of equations \\\\begin{align*}\\n3+a&=4-b\\\\\\\\\\n4+b&=7+a\\n\\\\end{align*} Rearranging the first equation, we have $a+b=1$. Substituting this into the second equation, we get $4+1=7+a$, so $a=-4$. Thus, $3-a=\\\\boxed{7}$.\"\n",
" \"content\": \"Simplifying the first equation, we have $a=1-b$. Substituting this into the second equation, we have $4+b=7+(1-b)$. Expanding the right side gives $4+b=7+1-b$. Combining like terms gives $2b=4$, so $b=2$. Substituting this back into $a=1-b$, we find that $a=-1$. Thus, $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"From the first equation, we have $a=4-b-3$. Substituting this into the second equation, we have $4+b=7+(4-b-3)$. Simplifying the right side of the equation gives $4+b=8-b$. Adding $b$ to both sides gives $4+2b=8$. Subtracting 4 from both sides gives $2b=4$. Dividing both sides by 2 gives $b=2$. Substituting this value back into the first equation gives $3+a=4-2$. Simplifying the right side gives $3+a=2$. Subtracting 3 from both sides gives $a=-1$. Finally, we have $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"From the first equation, subtracting $a$ and adding $4$ to both sides gives $7=b-a$. Substituting this into the second equation gives $4+(b-a)=7+a$, so $4+7=b+a$. Combining these equations gives $3+b+a=11+a$, so $b=\\\\boxed{8}$. Substituting into the first equation gives $3+a=4-8$ which gives $a=\\\\boxed{-9}$. Finally, $3-a=3-(-9)=3+9=\\\\boxed{12}$.\"\n",
" \"content\": \"We can start by solving the first equation for $a$ in terms of $b$. Subtracting $3$ from both sides of the equation $3+a=4-b$ gives $a=1-b$. We can substitute this expression for $a$ in the second equation to solve for $b$: \\\\begin{align*}\\n4+b&=7+a\\\\\\\\\\n4+b&=7+(1-b)\\\\\\\\\\n4+b&=8-b\\\\\\\\\\n2b&=4\\\\\\\\\\nb&=2.\\n\\\\end{align*}Substituting this value of $b$ back into the first equation to solve for $a$, we have $3+a=4-2$, so $a=-1$. Finally, we can find $3-a=3-(-1)=\\\\boxed{4}$.\"\n",
" \"content\": \"From the first equation, we have $a=1-b$. Substituting this into the second equation gives $4+b=7+1-b$. Simplifying gives $2b=4$, so $b=2$. Substituting this back into the first equation gives $3+a=4-2$, so $a=-1$. Therefore, $3-a=3-(-1)=4$. Simplifying gives $\\\\boxed{4}$.\"\n",
" \"content\": \"Adding the two given equations, we have $(3+a)+(4+b)=(4-b)+(7+a)$. Simplifying both sides gives $7+a+b=11+a-b$. We can subtract $a$ from both sides to get $7+b=11-b$. Adding $b$ to both sides gives $7+2b=11$. Subtracting 7 from both sides gives $2b=4$. Dividing by 2 gives $b=2$.\\n\\nWe can substitute $b=2$ into the first equation $3+a=4-b$ to solve for $a$. We have $3+a=4-2$ which simplifies to $3+a=2$. Subtracting 3 from both sides gives $a=-1$.\\n\\nFinally, we can substitute $a=-1$ into $3-a$ to find $3-a=3-(-1)$. Simplifying gives $3-a=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"Adding the two given equations, we have $(3+a)+(4+b)=(4-b)+(7+a)$. Simplifying both sides gives $7+a+b=11+a-b$. Subtracting $a$ from both sides gives $7+b=11-b$. Adding $b$ to both sides gives $7+2b=11$. Subtracting $7$ from both sides gives $2b=4$. Finally, dividing both sides by $2$ gives $b=2$. Substituting this value for $b$ into the second given equation, we have $4+2=7+a$. Simplifying gives $a=-1$. Therefore, $3-a=3-(-1)=4$. Thus, the final answer is $\\\\boxed{4}$.\"\n",
" \"content\": \"Let's start by simplifying the given equations. We have $3+a=4-b$, which we can rearrange to get $a=-b+1$. Similarly, we have $4+b=7+a$, which rearranges to $b=a+3$. \\n\\nWe can substitute the value of $b$ from the second equation into the first equation to get $a=(-a-3)+1$. Simplifying this equation gives $2a=-2$, so $a=-1$. \\n\\nSubstituting this value of $a$ into the second equation gives $b=(-1)+3$, so $b=2$. \\n\\nFinally, we can find $3-a$ by substituting $a=-1$ into $3-a$. This gives $3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"We have the equations $3+a=4-b$ and $4+b=7+a$. We can solve these equations using substitution or elimination. Let's solve it using elimination.\\n\\nTo eliminate $a$, we can add the first equation to the second equation. This gives us $(3+a)+(4+b)=(4-b)+(7+a)$. Simplifying both sides, we have $7 + a + b = 11 + a - b$.\\n\\nNow, let's isolate $b$ by subtracting $a$ from both sides: $7 + b = 11 - b$.\\n\\nTo isolate $b$ on one side, we can add $b$ to both sides: $7 + 2b = 11$.\\n\\nSubtracting $7$ from both sides gives $2b= 4$. Dividing both sides by $2$, we find $b=2$.\\n\\nNow, we can substitute $b=2$ into the second equation $4+b=7+a$. This gives $4+2=7+a$, or $6=7+a$. Subtracting $7$ from both sides gives $-1=a$.\\n\\nFinally, we can find $3-a$ by substituting $a=-1$ into $3-a$. This gives $3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"We can start by combining like terms in both equations. From the first equation, we have $a+3=b-4$, and from the second equation, we have $b+4=a-7$. \\n\\nNow, we can substitute $b-4$ for $a+3$ in the second equation, since they are equal. This gives us $b+4=(b-4)-7$. \\n\\nSimplifying, we have $b+4=b-11$. \\n\\nSubtracting $b$ from both sides, we get $4=-11$. \\n\\nThis is a contradiction, since $4$ does not equal $-11$. \\n\\nTherefore, the solution to this system of equations does not exist, and we cannot find the value of $3-a$. Thus, the answer is $\\\\boxed{\\\\text{DNE}}$.\"\n",
" \"content\": \"We can start by solving the first equation, $3+a=4-b$, for $a$ in terms of $b$ by subtracting $3$ from both sides and then adding $b$ to both sides. This gives us $a = 1-b$.\\n\\nWe can substitute this expression for $a$ into the second equation, $4+b=7+a$, to solve for $b$ in terms of $a$. After simplifying, we have $b=4-a$.\\n\\nTo find $3-a$, we substitute $b=4-a$ into the first equation $3+a=4-b$. This gives us $3+a=4-(4-a)$.\\n\\nSimplifying this equation gives $3+a=4-4+a$, so $3+a=a$.\\n\\nTherefore, $3-a = \\\\boxed{3}$.\"\n",
" \"content\": \"To solve this problem, we can start by solving the first equation $3+a=4-b$ for $b$. Subtracting $3$ from both sides gives $a=1-b$. We can substitute this into the second equation $4+b=7+a$ to get $4+b=7+(1-b)$. Expanding the right side gives $4+b=7+1-b$. Combining like terms gives $b+b=7+1-4$. Simplifying the right side gives $2b=4$. Dividing both sides by $2$ gives $b=2$. Now we can substitute this back into the first equation to solve for $a$. We have $3+a=4-2$, so $3+a=2$. Subtracting $3$ from both sides gives $a=-1$. Finally, we can find $3-a$ by subtracting $a$ from $3$. We have $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"Starting with the first equation, we have $3+a=4-b$. Rearranging this equation, we get $a=-b+1$. \\n\\nSubstituting this expression for $a$ into the second equation, we have $4+b=7+(-b+1)$. Simplifying this equation gives $b=-2$. \\n\\nSubstituting this value of $b$ into the equation $a=-b+1$, we find $a=3$. \\n\\nFinally, we can evaluate $3-a$ using the value we found for $a$. We have $3-a=3-3=\\\\boxed{0}$.\"\n",
" \"content\": \"Adding the two given equations, we have $(3+a)+(4+b)=(4-b)+(7+a)$. Simplifying both sides gives $7+a+b=11+a-b$. Subtracting $a$ from both sides gives $7+b=11-b$. Adding $b$ to both sides gives $7+2b=11$. Subtracting $7$ from both sides gives $2b=4$. Dividing both sides by $2$ gives $b=2$. Substituting this value of $b$ into the first given equation, we have $3+a=4-2$, or $a=-1$. Finally, substituting these values into $3-a$, we have $3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"From the first equation, $3+a=4-b$, we can subtract $3$ from both sides to find that $a=1-b$. Substituting this into the second equation, we have $4+b=7+(1-b)$. Expanding the right side gives $4+b=7+1-b$. We can then simplify this to $b+4=8-b$ by combining like terms. Adding $b$ to both sides gives $2b+4=8$. Subtracting $4$ from both sides gives $2b=4$. Finally, dividing both sides by $2$ gives $b=2$. \\n\\nNow that we have found the value of $b$, we can substitute it back into the first equation to find $3+a=4-2$. Simplifying gives $3+a=2$. Subtracting $3$ from both sides gives $a=-1$. \\n\\nFinally, we can find $3-a$ by subtracting $-1$ from $3$. This gives $3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"We can start by solving the first equation, $3+a=4-b$, for $a$. Subtracting 3 from both sides gives $a=1-b$. \\n\\nWe can substitute this value for $a$ into the second equation $4+b=7+a$. Substituting gives $4+b=7+(1-b)$. Expanding the right side gives $4+b=7+1-b$. Combining like terms gives $4+b=8-b$. \\n\\nAdding $b$ to both sides gives $4=8-2b$. Subtracting 8 from both sides gives $-4=-2b$. Dividing both sides by $-2$ gives $2=b$.\\n\\nNow that we know $b=2$, we can substitute this value back into the first equation to find $a$. Substituting gives $3+a=4-2$. Simplifying gives $3+a=2$. Subtracting 3 from both sides gives $a=-1$.\\n\\nFinally, we can find $3-a$ by substituting the value of $a$ we found into $3-a$. Substituting gives $3-(-1)$. Simplifying gives $3+1=4$.\\n\\nThus, $3-a=\\\\boxed{4}$.\"\n",
" \"content\": \"Starting with the first equation, we can subtract $a$ from both sides to isolate $3$:\\n\\n\\\\[3+a-a=4-b-a\\\\qquad\\\\Rightarrow\\\\qquad 3=4-b-a.\\\\]\\n\\nRearranging the terms, we have $b+a=4-3=1.$ Similarly, starting with the second equation, we can subtract $b$ from both sides to obtain $a+b=7-4=3.$ Adding these two equations, we have $2a+2b=4,$ so $a+b=2.$ Subtracting this equation from $b+a=1,$ we get $b-a=\\\\boxed{-1}.$\"\n",
" \"content\": \"From the first equation, we have $a=4-b-3=-b+1$. Substituting this into the second equation, we get $4+b=7+(-b+1)$, which simplifies to $4+b=8-b$. Solving for $b$, we find $2b=4$, so $b=2$. Substituting this back into $a=-b+1$, we find $a=-2+1=-1$. Finally, $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"We start by solving the first equation for $a$ by subtracting 3 from both sides to find $a=1-b$. Substituting this into the second equation gives $4+b=7+(1-b)$. Expanding the brackets gives $4+b=7+1-b$. Simplifying gives $2b=4$, so $b=2$. Substituting this into $a=1-b$ gives $a=1-2=-1$. Finally, substituting this into $3-a$ gives $3-(-1)=4$, so our final answer is $\\\\boxed{4}$.\"\n",
" \"content\": \"Adding the two given equations, we have $(3+a)+(4+b)=(4-b)+(7+a)$. Applying the commutative property of addition, we can rearrange the terms to get $(3+4)+(a+b)=(4+7)+(-b+a)$. Simplifying both sides gives $7+(a+b)=11+(a-b)$. We can rewrite this equation as $a+b+7=a-b+11$. Subtracting $a+b+4$ from both sides yields $7-4=a-b+11-(a+b+4)$. Simplifying gives $3=-4-b$. Adding $b$ to both sides gives $3+b=-4$. Subtracting 4 from both sides yields $b-1=-4$. Then, adding 1 to both sides gives $b=-3$. Substituting this into the first equation $3+a=4-b$, we can substitute $-3$ for $b$ to get $3+a=4-(-3)$. Simplifying gives $3+a=4+3$. Subtracting 3 from both sides yields $a=4$. Finally, substituting this into $3-a$, we can substitute $4$ for $a$ to get $3-4=\\\\boxed{-1}$.\"\n",
" \"content\": \"From the first equation, we have $a=4-b-3=1-b$. Substituting this into the second equation gives $4+b=7+(1-b)$. Expanding the right side gives $4+b=7+1-b$. Simplifying the right side gives $4+b=8-b$. Adding $b$ to both sides gives $4+2b=8$. Subtracting 4 from both sides gives $2b=4$. Dividing both sides by 2 gives $b=2$. Substituting this into the first equation gives $3+a=4-2$. Simplifying the right side gives $3+a=2$. Subtracting 3 from both sides gives $a=-1$. Finally, we have $3-a=3-(-1)=\\\\boxed{4}$.\"\n",
" \"content\": \"We start by solving the first equation for $a$: \\\\begin{align*}\\n3+a&=4-b \\\\\\\\\\na&=1-b.\\n\\\\end{align*}We substitute this expression for $a$ into the second equation: \\\\begin{align*}\\n4+b&=7+a \\\\\\\\\\n4+b&=7+(1-b) \\\\\\\\\\n4+b&=8-b.\\n\\\\end{align*}Adding $b$ to both sides gives $4+2b=8$, so $2b=4$ and $b=2$. Substituting this value into $a=1-b$ gives $a=1-2=-1$. Therefore, $3-a=3-(-1)=3+1=4$. Thus, $3-a=\\\\boxed{4}$.\"\n",
" \"content\": \"We start by solving the first equation, $3+a=4-b$, for $a$. Subtracting $3$ from both sides gives $a=1-b$.\\n\\nWe can substitute this expression for $a$ into the second equation, $4+b=7+a$. Substituting $1-b$ for $a$ gives $4+b=7+(1-b)$.\\n\\nExpanding the parentheses gives $4+b=7+1-b$. Simplifying the right side gives $4+b=8-b$.\\n\\nAdding $b$ to both sides gives $4+2b=8$. Subtracting $4$ from both sides gives $2b=4$.\\n\\nFinally, dividing both sides by $2$ gives $b=2$. We can substitute this value back into the equation $a=1-b$ to find $a=1-2=-1$.\\n\\nNow, we can find $3-a$ by subtracting $(-1)$ from $3$. This gives $3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"Combining the two equations, we have $3+a+4+b=4-b+7+a$. Simplifying both sides, we have $7+a+b=11+a-b$. Subtracting $a$ from both sides, we have $7+b=11-b$. Adding $b$ to both sides, we have $7+2b=11$. Subtracting 7 from both sides, we have $2b=4$. Dividing both sides by 2, we have $b=2$. Substituting this value back into the first equation, we have $3+a=4-2$, so $3+a=2$. Subtracting 3 from both sides, we have $a=-1$. Therefore, $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"We can start by subtracting $a$ from both sides of the first equation to find that $3 = 4 - b - a$. Similarly, we can subtract $a$ from both sides of the second equation to find that $4 = 7 + a - b$. Adding these two equations gives $3 + 4 = 4 - b - a + 7 + a - b$, which simplifies to $7 = 11 - 2b$. Solving for $b$ gives $b = 2$. Substituting this value into the first equation gives $3 + a = 4 - 2$, so $a = 3$. Then $3 - a = 3 - 3 = \\\\boxed{0}$.\"\n",
" \"content\": \"Adding the two given equations, we have $$(3+a)+(4+b)=(4-b)+(7+a).$$Expanding both sides gives $$7+a+b=11+a-b.$$We can then cancel out the $a$ term on both sides to get $$7+b=11-b.$$Adding $b$ to both sides gives $$7+2b=11.$$Subtracting $7$ from both sides gives $$2b=4.$$Dividing both sides by $2$ gives $$b=2.$$Plugging this value of $b$ into either of the original equations, we can solve for $a$. Using the first equation, we have $$3+a=4-2 \\\\Rightarrow a=-1.$$Finally, we can find $3-a$ as $$3-a=3-(-1)=3+1=\\\\boxed{4}.$$\"\n",
" \"content\": \"We can start by adding $a$ to both sides of the first equation and subtracting $b$ from both sides of the second equation to obtain \\\\begin{align*}\\na+b&=1, \\\\\\\\\\na-b&=-3.\\n\\\\end{align*} We can then add these equations to eliminate $b$: $$2a=1+(-3)=-2.$$Dividing both sides by $2$ gives $a=-1$. Substituting into the second equation gives $-1-b=-3$, so $b=2$. Finally, we find that $3-a=3-(-1)=\\\\boxed{4}$.\"\n",
" \"content\": \"We can start by subtracting $a$ from both sides of the first equation and subtracting $b$ from both sides of the second equation to obtain \\\\begin{align*}\\n3&=4-b-a,\\\\\\\\\\n4&=7+a-b.\\n\\\\end{align*}We can rearrange the first equation to get $b+a=4-3=1$. Similarly, we can rearrange the second equation to get $a-b=4-7=-3$. Adding these equations, we find that $(b+a)+(a-b)=1+(-3)$, which implies $2a= -2$. Hence, $a=-1$. We can substitute this value of $a$ into $a-b=-3$ to find that $-1-b=-3$, so $b=-1-(-3)=2$. Finally, we have \\\\begin{align*}\\n3-a&=3-(-1)=3+1=\\\\boxed{4}.\\n\\\\end{align*}\"\n",
" \"content\": \"Adding the two given equations, we have $$(3+a)+(4+b)=(4-b)+(7+a).$$Simplifying both sides gives $7+a+b=11+a-b$. Subtracting $a$ and $b$ from both sides gives $7=11$, which is a contradiction. Therefore, there are no solutions to the given equations, and the value of $3-a$ is undefined. So we have $3-a=\\\\boxed{ \\\\text{undefined}}$.\"\n",
" \"content\": \"To solve this problem, we can start by isolating $a$ in both equations. \\n\\nFrom the first equation, $3+a=4-b$, we can subtract 3 from both sides to get $a=1-b$. \\n\\nFrom the second equation, $4+b=7+a$, we can subtract 4 from both sides to get $b=3+a$. \\n\\nNow, we can substitute $1-b$ for $a$ in the second equation to get $b=3+1-b$. Simplifying this equation gives $2b=4$, so $b=2$. \\n\\nSubstituting $b=2$ into the equation $a=1-b$, we find that $a=1-2=-1$. \\n\\nFinally, we can find $3-a$ by subtracting $(-1)$ from 3, which gives us $3-(-1)=\\\\boxed{4}$.\"\n",
" \"content\": \"We start by solving the first equation for $a$ in terms of $b$. Subtracting $3$ from both sides, we have $a=1-b$. Substituting this into the second equation, we get $4+b=7+(1-b)$. Simplifying, we have $4+b=8-b$. Adding $b$ to both sides, we have $4+2b=8$. Subtracting $4$ from both sides, we have $2b=4$. Dividing both sides by $2$, we have $b=2$. Substituting this into the equation $a=1-b$, we have $a=1-2=-1$. Thus, $3-a=3-(-1)=3+1=\\\\boxed{4}$.\"\n",
" \"content\": \"From the first equation, we have $3+a=4-b$. Rearranging this equation, we get $a=-1-b$. Substituting this value of $a$ into the second equation, we have $4+b=7+(-1-b)$. Simplifying this equation, we get $b+1=-b+6$. Adding $b$ to both sides and subtracting $1$ from both sides, we have $2b=5$. Therefore, $b=\\\\frac{5}{2}$. Substituting this value of $b$ into the first equation, we have $3+a=4-\\\\frac{5}{2}$. Simplifying this equation, we get $a=\\\\frac{3}{2}$. Finally, we have $3-a=3-\\\\frac{3}{2}=\\\\boxed{\\\\frac{3}{2}}$.\"\n",
" \"content\": \"Manipulating the given equations, we have $a=1-b$ and $b=3+a$. Substituting $a=1-b$ into the second equation, we get $b=3+1-b$, which implies $2b=4$, or $b=2$. Substituting $b=2$ into the first equation, we get $a=1-2=-1$. Finally, evaluating $3-a$, we find that $3-a=3-(-1)=\\\\boxed{4}$.\"\n",
"metric_results on the example data instance: {'expected_success': 1.0, 'success': True, 'success_vote': 1.0, 'voted_answer': 'We start by solving the first equation for $a$: $$3+a=4-b.$$Adding $-3$ to both sides gives $a=1-b$. Substituting this expression for $a$ into the second equation gives $$4+b=7+(1-b).$$Simplifying this expression, we find that $b=2$. Substituting $b=2$ into the first equation to solve for $a$, we find that $a=1-2=-1$. Finally, we have $3-a=3-(-1)=3+1=\\\\boxed{4}$.', 'votes': 27}\n"
"You can use `autogen.ChatCompletion.test` to evaluate the performance of an entire dataset with the tuned config. The following code will take a while (30 mins to 1 hour) to evaluate all the test data instances if uncommented and run. It will cost roughly $3. "
"[flaml.autogen.oai.completion: 08-01 22:55:55] {916} INFO - evaluating data instance 0\n",
"[flaml.autogen.oai.completion: 08-01 22:56:09] {916} INFO - evaluating data instance 1\n",
"[flaml.autogen.oai.completion: 08-01 22:56:20] {916} INFO - evaluating data instance 2\n",
"[flaml.autogen.oai.completion: 08-01 22:56:28] {916} INFO - evaluating data instance 3\n",
"[flaml.autogen.oai.completion: 08-01 22:56:34] {916} INFO - evaluating data instance 4\n",
"[flaml.autogen.oai.completion: 08-01 22:56:44] {916} INFO - evaluating data instance 5\n",
"[flaml.autogen.oai.completion: 08-01 22:56:57] {916} INFO - evaluating data instance 6\n",
"[flaml.autogen.oai.completion: 08-01 22:57:12] {916} INFO - evaluating data instance 7\n",
"[flaml.autogen.oai.completion: 08-01 22:57:20] {916} INFO - evaluating data instance 8\n",
"[flaml.autogen.oai.completion: 08-01 22:57:24] {916} INFO - evaluating data instance 9\n",
"[flaml.autogen.oai.completion: 08-01 22:57:34] {916} INFO - evaluating data instance 10\n",
"[flaml.autogen.oai.completion: 08-01 22:57:43] {916} INFO - evaluating data instance 11\n",
"[flaml.autogen.oai.completion: 08-01 22:57:52] {916} INFO - evaluating data instance 12\n",
"[flaml.autogen.oai.completion: 08-01 22:58:00] {916} INFO - evaluating data instance 13\n",
"[flaml.autogen.oai.completion: 08-01 22:58:08] {916} INFO - evaluating data instance 14\n",
"[flaml.autogen.oai.completion: 08-01 22:58:14] {916} INFO - evaluating data instance 15\n",
"[flaml.autogen.oai.completion: 08-01 22:58:22] {916} INFO - evaluating data instance 16\n",
"[flaml.autogen.oai.completion: 08-01 22:58:29] {916} INFO - evaluating data instance 17\n",
"[flaml.autogen.oai.completion: 08-01 22:58:40] {916} INFO - evaluating data instance 18\n",
"[flaml.autogen.oai.completion: 08-01 22:58:48] {916} INFO - evaluating data instance 19\n",
"[flaml.autogen.oai.completion: 08-01 22:58:57] {916} INFO - evaluating data instance 20\n",
"[flaml.autogen.oai.completion: 08-01 22:59:15] {916} INFO - evaluating data instance 21\n",
"[flaml.autogen.oai.completion: 08-01 22:59:29] {916} INFO - evaluating data instance 22\n",
"[flaml.autogen.oai.completion: 08-01 22:59:41] {916} INFO - evaluating data instance 23\n",
"[flaml.autogen.oai.completion: 08-01 22:59:54] {916} INFO - evaluating data instance 24\n",
"[flaml.autogen.oai.completion: 08-01 23:00:07] {916} INFO - evaluating data instance 25\n",
"[flaml.autogen.oai.completion: 08-01 23:00:24] {916} INFO - evaluating data instance 26\n",
"[flaml.autogen.oai.completion: 08-01 23:00:39] {916} INFO - evaluating data instance 27\n",
"[flaml.autogen.oai.completion: 08-01 23:00:55] {916} INFO - evaluating data instance 28\n",
"[flaml.autogen.oai.completion: 08-01 23:01:11] {916} INFO - evaluating data instance 29\n",
"[flaml.autogen.oai.completion: 08-01 23:01:26] {916} INFO - evaluating data instance 30\n",
"[flaml.autogen.oai.completion: 08-01 23:01:35] {916} INFO - evaluating data instance 31\n",
"[flaml.autogen.oai.completion: 08-01 23:01:46] {916} INFO - evaluating data instance 32\n",
"[flaml.autogen.oai.completion: 08-01 23:01:54] {916} INFO - evaluating data instance 33\n",
"[flaml.autogen.oai.completion: 08-01 23:02:03] {916} INFO - evaluating data instance 34\n",
"[flaml.autogen.oai.completion: 08-01 23:02:11] {916} INFO - evaluating data instance 35\n",
"[flaml.autogen.oai.completion: 08-01 23:02:27] {916} INFO - evaluating data instance 36\n",
"[flaml.autogen.oai.completion: 08-01 23:02:40] {916} INFO - evaluating data instance 37\n",
"[flaml.autogen.oai.completion: 08-01 23:02:46] {916} INFO - evaluating data instance 38\n",
"[flaml.autogen.oai.completion: 08-01 23:02:56] {916} INFO - evaluating data instance 39\n",
"[flaml.autogen.oai.completion: 08-01 23:03:06] {916} INFO - evaluating data instance 40\n",
"[flaml.autogen.oai.completion: 08-01 23:03:15] {916} INFO - evaluating data instance 41\n",
"[flaml.autogen.oai.completion: 08-01 23:03:23] {916} INFO - evaluating data instance 42\n",
"[flaml.autogen.oai.completion: 08-01 23:03:30] {916} INFO - evaluating data instance 43\n",
"[flaml.autogen.oai.completion: 08-01 23:03:38] {916} INFO - evaluating data instance 44\n",
"[flaml.autogen.oai.completion: 08-01 23:03:49] {916} INFO - evaluating data instance 45\n",
"[flaml.autogen.oai.completion: 08-01 23:03:55] {916} INFO - evaluating data instance 46\n",
"[flaml.autogen.oai.completion: 08-01 23:04:02] {916} INFO - evaluating data instance 47\n",
"[flaml.autogen.oai.completion: 08-01 23:04:14] {916} INFO - evaluating data instance 48\n",
"[flaml.autogen.oai.completion: 08-01 23:04:30] {916} INFO - evaluating data instance 49\n",
"[flaml.autogen.oai.completion: 08-01 23:04:42] {916} INFO - evaluating data instance 50\n",
"[flaml.autogen.oai.completion: 08-01 23:04:53] {916} INFO - evaluating data instance 51\n",
"[flaml.autogen.oai.completion: 08-01 23:05:05] {916} INFO - evaluating data instance 52\n",
"[flaml.autogen.oai.completion: 08-01 23:05:10] {916} INFO - evaluating data instance 53\n",
"[flaml.autogen.oai.completion: 08-01 23:05:22] {916} INFO - evaluating data instance 54\n",
"[flaml.autogen.oai.completion: 08-01 23:05:31] {916} INFO - evaluating data instance 55\n",
"[flaml.autogen.oai.completion: 08-01 23:05:43] {916} INFO - evaluating data instance 56\n",
"[flaml.autogen.oai.completion: 08-01 23:05:49] {916} INFO - evaluating data instance 57\n",
"[flaml.autogen.oai.completion: 08-01 23:05:59] {916} INFO - evaluating data instance 58\n",
"[flaml.autogen.oai.completion: 08-01 23:06:12] {916} INFO - evaluating data instance 59\n",
"[flaml.autogen.oai.completion: 08-01 23:06:20] {916} INFO - evaluating data instance 60\n",
"[flaml.autogen.oai.completion: 08-01 23:06:32] {916} INFO - evaluating data instance 61\n",
"[flaml.autogen.oai.completion: 08-01 23:06:42] {916} INFO - evaluating data instance 62\n",
"[flaml.autogen.oai.completion: 08-01 23:06:54] {916} INFO - evaluating data instance 63\n",
"[flaml.autogen.oai.completion: 08-01 23:07:08] {916} INFO - evaluating data instance 64\n",
"[flaml.autogen.oai.completion: 08-01 23:07:22] {916} INFO - evaluating data instance 65\n",
"[flaml.autogen.oai.completion: 08-01 23:07:34] {916} INFO - evaluating data instance 66\n",
"[flaml.autogen.oai.completion: 08-01 23:07:43] {916} INFO - evaluating data instance 67\n",
"[flaml.autogen.oai.completion: 08-01 23:07:49] {916} INFO - evaluating data instance 68\n",
"[flaml.autogen.oai.completion: 08-01 23:08:00] {916} INFO - evaluating data instance 69\n",
"[flaml.autogen.oai.completion: 08-01 23:08:12] {916} INFO - evaluating data instance 70\n",
"[flaml.autogen.oai.completion: 08-01 23:08:27] {916} INFO - evaluating data instance 71\n",
"[flaml.autogen.oai.completion: 08-01 23:08:36] {916} INFO - evaluating data instance 72\n",
"[flaml.autogen.oai.completion: 08-01 23:08:50] {916} INFO - evaluating data instance 73\n",
"[flaml.autogen.oai.completion: 08-01 23:08:58] {916} INFO - evaluating data instance 74\n",
"[flaml.autogen.oai.completion: 08-01 23:09:10] {916} INFO - evaluating data instance 75\n",
"[flaml.autogen.oai.completion: 08-01 23:09:19] {916} INFO - evaluating data instance 76\n",
"[flaml.autogen.oai.completion: 08-01 23:09:30] {916} INFO - evaluating data instance 77\n",
"[flaml.autogen.oai.completion: 08-01 23:09:38] {916} INFO - evaluating data instance 78\n",
"[flaml.autogen.oai.completion: 08-01 23:09:48] {916} INFO - evaluating data instance 79\n",
"[flaml.autogen.oai.completion: 08-01 23:09:58] {916} INFO - evaluating data instance 80\n",
"[flaml.autogen.oai.completion: 08-01 23:10:08] {916} INFO - evaluating data instance 81\n",
"[flaml.autogen.oai.completion: 08-01 23:10:19] {916} INFO - evaluating data instance 82\n",
"[flaml.autogen.oai.completion: 08-01 23:10:32] {916} INFO - evaluating data instance 83\n",
"[flaml.autogen.oai.completion: 08-01 23:10:37] {916} INFO - evaluating data instance 84\n",
"[flaml.autogen.oai.completion: 08-01 23:10:52] {916} INFO - evaluating data instance 85\n",
"[flaml.autogen.oai.completion: 08-01 23:11:07] {916} INFO - evaluating data instance 86\n",
"[flaml.autogen.oai.completion: 08-01 23:11:22] {916} INFO - evaluating data instance 87\n",
"[flaml.autogen.oai.completion: 08-01 23:11:33] {916} INFO - evaluating data instance 88\n",
"[flaml.autogen.oai.completion: 08-01 23:11:48] {916} INFO - evaluating data instance 89\n",
"[flaml.autogen.oai.completion: 08-01 23:11:55] {916} INFO - evaluating data instance 90\n",
"[flaml.autogen.oai.completion: 08-01 23:12:04] {916} INFO - evaluating data instance 91\n",
"[flaml.autogen.oai.completion: 08-01 23:12:15] {916} INFO - evaluating data instance 92\n",
"[flaml.autogen.oai.completion: 08-01 23:12:27] {916} INFO - evaluating data instance 93\n",
"[flaml.autogen.oai.completion: 08-01 23:12:39] {916} INFO - evaluating data instance 94\n",
"[flaml.autogen.oai.completion: 08-01 23:12:55] {916} INFO - evaluating data instance 95\n",
"[flaml.autogen.oai.completion: 08-01 23:13:05] {916} INFO - evaluating data instance 96\n",
"[flaml.autogen.oai.completion: 08-01 23:13:17] {916} INFO - evaluating data instance 97\n",
"[flaml.autogen.oai.completion: 08-01 23:13:30] {916} INFO - evaluating data instance 98\n",
"[flaml.autogen.oai.completion: 08-01 23:13:43] {916} INFO - evaluating data instance 99\n",
"[flaml.autogen.oai.completion: 08-01 23:13:51] {916} INFO - evaluating data instance 100\n",
"[flaml.autogen.oai.completion: 08-01 23:14:04] {916} INFO - evaluating data instance 101\n",
"[flaml.autogen.oai.completion: 08-01 23:14:09] {916} INFO - evaluating data instance 102\n",
"[flaml.autogen.oai.completion: 08-01 23:14:20] {916} INFO - evaluating data instance 103\n",
"[flaml.autogen.oai.completion: 08-01 23:14:32] {916} INFO - evaluating data instance 104\n",
"[flaml.autogen.oai.completion: 08-01 23:14:46] {916} INFO - evaluating data instance 105\n",
"[flaml.autogen.oai.completion: 08-01 23:14:59] {916} INFO - evaluating data instance 106\n",
"[flaml.autogen.oai.completion: 08-01 23:15:13] {916} INFO - evaluating data instance 107\n",
"[flaml.autogen.oai.completion: 08-01 23:15:23] {916} INFO - evaluating data instance 108\n",
"[flaml.autogen.oai.completion: 08-01 23:15:34] {916} INFO - evaluating data instance 109\n",
"[flaml.autogen.oai.completion: 08-01 23:15:46] {916} INFO - evaluating data instance 110\n",
"[flaml.autogen.oai.completion: 08-01 23:15:56] {916} INFO - evaluating data instance 111\n",
"[flaml.autogen.oai.completion: 08-01 23:16:10] {916} INFO - evaluating data instance 112\n",
"[flaml.autogen.oai.completion: 08-01 23:16:15] {916} INFO - evaluating data instance 113\n",
"[flaml.autogen.oai.completion: 08-01 23:16:27] {916} INFO - evaluating data instance 114\n",
"[flaml.autogen.oai.completion: 08-01 23:16:35] {916} INFO - evaluating data instance 115\n",
"[flaml.autogen.oai.completion: 08-01 23:16:48] {916} INFO - evaluating data instance 116\n",
"[flaml.autogen.oai.completion: 08-01 23:17:02] {916} INFO - evaluating data instance 117\n",
"[flaml.autogen.oai.completion: 08-01 23:17:14] {916} INFO - evaluating data instance 118\n",
"[flaml.autogen.oai.completion: 08-01 23:17:18] {916} INFO - evaluating data instance 119\n",
"[flaml.autogen.oai.completion: 08-01 23:17:31] {916} INFO - evaluating data instance 120\n",
"[flaml.autogen.oai.completion: 08-01 23:17:37] {916} INFO - evaluating data instance 121\n",
"[flaml.autogen.oai.completion: 08-01 23:17:46] {916} INFO - evaluating data instance 122\n",
"[flaml.autogen.oai.completion: 08-01 23:17:53] {916} INFO - evaluating data instance 123\n",
"[flaml.autogen.oai.completion: 08-01 23:18:00] {916} INFO - evaluating data instance 124\n",
"[flaml.autogen.oai.completion: 08-01 23:18:11] {916} INFO - evaluating data instance 125\n",
"[flaml.autogen.oai.completion: 08-01 23:18:17] {916} INFO - evaluating data instance 126\n",
"[flaml.autogen.oai.completion: 08-01 23:18:27] {916} INFO - evaluating data instance 127\n",
"[flaml.autogen.oai.completion: 08-01 23:18:30] {916} INFO - evaluating data instance 128\n",
"[flaml.autogen.oai.completion: 08-01 23:18:45] {916} INFO - evaluating data instance 129\n",
"[flaml.autogen.oai.completion: 08-01 23:18:53] {916} INFO - evaluating data instance 130\n",
"[flaml.autogen.oai.completion: 08-01 23:19:03] {916} INFO - evaluating data instance 131\n",
"[flaml.autogen.oai.completion: 08-01 23:19:07] {916} INFO - evaluating data instance 132\n",
"[flaml.autogen.oai.completion: 08-01 23:19:15] {916} INFO - evaluating data instance 133\n",
"[flaml.autogen.oai.completion: 08-01 23:19:29] {916} INFO - evaluating data instance 134\n",
"[flaml.autogen.oai.completion: 08-01 23:19:44] {916} INFO - evaluating data instance 135\n",
"[flaml.autogen.oai.completion: 08-01 23:19:55] {916} INFO - evaluating data instance 136\n",
"[flaml.autogen.oai.completion: 08-01 23:20:02] {916} INFO - evaluating data instance 137\n",
"[flaml.autogen.oai.completion: 08-01 23:20:15] {916} INFO - evaluating data instance 138\n",
"[flaml.autogen.oai.completion: 08-01 23:20:24] {916} INFO - evaluating data instance 139\n",
"[flaml.autogen.oai.completion: 08-01 23:20:34] {916} INFO - evaluating data instance 140\n",
"[flaml.autogen.oai.completion: 08-01 23:20:40] {916} INFO - evaluating data instance 141\n",
"[flaml.autogen.oai.completion: 08-01 23:20:49] {916} INFO - evaluating data instance 142\n",
"[flaml.autogen.oai.completion: 08-01 23:20:55] {916} INFO - evaluating data instance 143\n",
"[flaml.autogen.oai.completion: 08-01 23:21:05] {916} INFO - evaluating data instance 144\n",
"[flaml.autogen.oai.completion: 08-01 23:21:10] {916} INFO - evaluating data instance 145\n",
"[flaml.autogen.oai.completion: 08-01 23:21:17] {916} INFO - evaluating data instance 146\n",
"[flaml.autogen.oai.completion: 08-01 23:21:25] {916} INFO - evaluating data instance 147\n",
"[flaml.autogen.oai.completion: 08-01 23:21:38] {916} INFO - evaluating data instance 148\n",
"[flaml.autogen.oai.completion: 08-01 23:21:54] {916} INFO - evaluating data instance 149\n",
"[flaml.autogen.oai.completion: 08-01 23:22:05] {916} INFO - evaluating data instance 150\n",
"[flaml.autogen.oai.completion: 08-01 23:22:13] {916} INFO - evaluating data instance 151\n",
"[flaml.autogen.oai.completion: 08-01 23:22:24] {916} INFO - evaluating data instance 152\n",
"[flaml.autogen.oai.completion: 08-01 23:22:35] {916} INFO - evaluating data instance 153\n",
"[flaml.autogen.oai.completion: 08-01 23:22:44] {916} INFO - evaluating data instance 154\n",
"[flaml.autogen.oai.completion: 08-01 23:22:53] {916} INFO - evaluating data instance 155\n",
"[flaml.autogen.oai.completion: 08-01 23:23:01] {916} INFO - evaluating data instance 156\n",
"[flaml.autogen.oai.completion: 08-01 23:23:16] {916} INFO - evaluating data instance 157\n",
"[flaml.autogen.oai.completion: 08-01 23:23:23] {916} INFO - evaluating data instance 158\n",
"[flaml.autogen.oai.completion: 08-01 23:23:31] {916} INFO - evaluating data instance 159\n",
"[flaml.autogen.oai.completion: 08-01 23:23:44] {916} INFO - evaluating data instance 160\n",
"[flaml.autogen.oai.completion: 08-01 23:23:57] {916} INFO - evaluating data instance 161\n",
"[flaml.autogen.oai.completion: 08-01 23:24:03] {916} INFO - evaluating data instance 162\n",
"[flaml.autogen.oai.completion: 08-01 23:24:09] {916} INFO - evaluating data instance 163\n",
"[flaml.autogen.oai.completion: 08-01 23:24:16] {916} INFO - evaluating data instance 164\n",
"[flaml.autogen.oai.completion: 08-01 23:24:28] {916} INFO - evaluating data instance 165\n",
"[flaml.autogen.oai.completion: 08-01 23:24:39] {916} INFO - evaluating data instance 166\n",
"[flaml.autogen.oai.completion: 08-01 23:24:55] {916} INFO - evaluating data instance 167\n",
"[flaml.autogen.oai.completion: 08-01 23:25:00] {916} INFO - evaluating data instance 168\n",
"[flaml.autogen.oai.completion: 08-01 23:25:16] {916} INFO - evaluating data instance 169\n",
"[flaml.autogen.oai.completion: 08-01 23:25:23] {916} INFO - evaluating data instance 170\n",
"[flaml.autogen.oai.completion: 08-01 23:25:31] {916} INFO - evaluating data instance 171\n",
"[flaml.autogen.oai.completion: 08-01 23:25:36] {916} INFO - evaluating data instance 172\n",
"[flaml.autogen.oai.completion: 08-01 23:25:44] {916} INFO - evaluating data instance 173\n",
"[flaml.autogen.oai.completion: 08-01 23:25:56] {916} INFO - evaluating data instance 174\n",
"[flaml.autogen.oai.completion: 08-01 23:26:07] {916} INFO - evaluating data instance 175\n",
"[flaml.autogen.oai.completion: 08-01 23:26:21] {916} INFO - evaluating data instance 176\n",
"[flaml.autogen.oai.completion: 08-01 23:26:27] {916} INFO - evaluating data instance 177\n",
"[flaml.autogen.oai.completion: 08-01 23:26:34] {916} INFO - evaluating data instance 178\n",
"[flaml.autogen.oai.completion: 08-01 23:26:47] {916} INFO - evaluating data instance 179\n",
"[flaml.autogen.oai.completion: 08-01 23:27:01] {916} INFO - evaluating data instance 180\n",
"[flaml.autogen.oai.completion: 08-01 23:27:15] {916} INFO - evaluating data instance 181\n",
"[flaml.autogen.oai.completion: 08-01 23:27:22] {916} INFO - evaluating data instance 182\n",
"[flaml.autogen.oai.completion: 08-01 23:27:29] {916} INFO - evaluating data instance 183\n",
"[flaml.autogen.oai.completion: 08-01 23:27:40] {916} INFO - evaluating data instance 184\n",
"[flaml.autogen.oai.completion: 08-01 23:27:49] {916} INFO - evaluating data instance 185\n",
"[flaml.autogen.oai.completion: 08-01 23:27:55] {916} INFO - evaluating data instance 186\n",
"[flaml.autogen.oai.completion: 08-01 23:28:02] {916} INFO - evaluating data instance 187\n",
"[flaml.autogen.oai.completion: 08-01 23:28:06] {916} INFO - evaluating data instance 188\n",
"[flaml.autogen.oai.completion: 08-01 23:28:18] {916} INFO - evaluating data instance 189\n",
"[flaml.autogen.oai.completion: 08-01 23:28:27] {916} INFO - evaluating data instance 190\n",
"[flaml.autogen.oai.completion: 08-01 23:28:37] {916} INFO - evaluating data instance 191\n",
"[flaml.autogen.oai.completion: 08-01 23:28:49] {916} INFO - evaluating data instance 192\n",
"[flaml.autogen.oai.completion: 08-01 23:29:01] {916} INFO - evaluating data instance 193\n",
"[flaml.autogen.oai.completion: 08-01 23:29:14] {916} INFO - evaluating data instance 194\n",
"[flaml.autogen.oai.completion: 08-01 23:29:21] {916} INFO - evaluating data instance 195\n",
"[flaml.autogen.oai.completion: 08-01 23:29:30] {916} INFO - evaluating data instance 196\n",
"[flaml.autogen.oai.completion: 08-01 23:29:42] {916} INFO - evaluating data instance 197\n",
"[flaml.autogen.oai.completion: 08-01 23:29:56] {916} INFO - evaluating data instance 198\n",
"[flaml.autogen.oai.completion: 08-01 23:30:04] {916} INFO - evaluating data instance 199\n",
"[flaml.autogen.oai.completion: 08-01 23:30:20] {916} INFO - evaluating data instance 200\n",
"performance on test data with the tuned config: {'expected_success': 0.9914855260776184, 'success': 0.9950248756218906, 'success_vote': 0.9203980099502488, 'votes': 31.582089552238806, 'cost': 2.697486000000001, 'inference_cost': 0.01342032835820896}\n"
"# print(\"performance on test data with the tuned config:\", result)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"What about the default, untuned gpt-4 config (with the same prompt as the tuned config)? We can evaluate it and compare:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"performance on test data from gpt-4 with a default config: {'expected_success': 0.6965174129353234, 'success': 0.6965174129353234, 'success_vote': 0.6965174129353234, 'votes': 1.0, 'cost': 1.9264799999999993, 'inference_cost': 0.009584477611940295}\n"
]
}
],
"source": [
"# the following code will cost roughly $2 if uncommented and run.\n",
"# print(\"performance on test data from gpt-4 with a default config:\", default_result)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tuned config succeeds in 90.5% test cases\n",
"untuned config succeeds in 69.7% test cases\n"
]
}
],
"source": [
"# print(\"tuned config succeeds in {:.1f}% test cases\".format(result[\"success_vote\"] * 100))\n",
"# print(\"untuned config succeeds in {:.1f}% test cases\".format(default_result[\"success_vote\"] * 100))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The default use of GPT-4 has a much lower accuracy. Note that the default config has a lower inference cost. What if we heuristically increase the number of responses n?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# The following evaluation costs $3 and longer than one hour if you uncomment it and run it.\n",
"# print(\"performance on test data from gpt-4 with a default config and n=2:\", result_n2)\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The inference cost is doubled and matches the tuned config. But the success rate doesn't improve much. What if we further increase the number of responses n to 5?"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# The following evaluation costs $8 and longer than one hour if you uncomment it and run it.\n",
"# print(\"performance on test data from gpt-4 with a default config and n=5:\", result_n5)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We find that the 'success_vote' metric is increased at the cost of exceeding the inference budget. But the tuned configuration has both higher 'success_vote' (91% vs. 87%) and lower average inference cost ($0.015 vs. $0.037 per instance).\n",
"\n",
"A developer could use flaml to tune the configuration to satisfy the target inference budget while maximizing the value out of it."