Post release update (#985)

* news update

* doc update

* avoid KeyError

* bump version to 1.2.1

* handle empty responses

* typo

* eval function
This commit is contained in:
Chi Wang 2023-04-10 13:46:28 -07:00 committed by GitHub
parent a701cd82f8
commit c780d79004
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 17 additions and 8 deletions

View File

@ -14,7 +14,7 @@
<br>
</p>
:fire: OpenAI GPT-3 models support in v1.1.3. ChatGPT and GPT-4 support will be added in v1.2.0.
:fire: v1.2.0 is released with support for ChatGPT and GPT-4.
:fire: A [lab forum](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) on FLAML at AAAI 2023.

View File

@ -290,8 +290,16 @@ def eval_math_responses(responses, solution=None, **args):
Returns:
dict: The success metrics.
"""
success_list = []
n = len(responses)
if not n:
return {
"expected_success": 0,
"success": False,
"success_vote": 0,
"voted_answer": None,
"votes": 0,
}
success_list = []
if solution is not None:
for i in range(n):
response = responses[i]

View File

@ -843,7 +843,7 @@ class Completion:
choices = response["choices"]
if "text" in choices[0]:
return [choice["text"] for choice in choices]
return [choice["message"]["content"] for choice in choices]
return [choice["message"].get("content", "") for choice in choices]
class ChatCompletion(Completion):

View File

@ -1 +1 @@
__version__ = "1.2.0"
__version__ = "1.2.1"

View File

@ -216,6 +216,7 @@ def test_math(num_samples=-1):
print("tuned config", config)
result = oai.ChatCompletion.test(test_data_sample, config)
print("result from tuned config:", result)
print("empty responses", eval_math_responses([], None))
if __name__ == "__main__":

View File

@ -56,7 +56,7 @@ test_data = [
]
```
### Defining the metric
### Define the metric
Before starting tuning, you need to define the metric for the optimization. For each code generation task, we can use the model to generate multiple candidate responses, and then select one from them. If the final selected response can pass a unit test, we consider the task as successfully solved. Then we can define the average success rate on a collection of tasks as the optimization metric.
@ -69,7 +69,7 @@ eval_with_generated_assertions = partial(eval_function_completions, assertions=g
This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.
### Tuning Hyperparameters for OpenAI
### Tune the hyperparameters
The tuning will be performed under the specified optimization budgets.

View File

@ -44,13 +44,13 @@ Collect a diverse set of instances. They can be stored in an iterable of dicts.
The evaluation function should take a list of responses, and other keyword arguments corresponding to the keys in each validation data instance as input, and output a dict of metrics. For example,
```python
def success_metrics(responses: List[str], problem: str, solution: str) -> Dict:
def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
# select a response from the list of responses
# check whether the answer is correct
return {"success": True or False}
```
`flaml.autogen` offers some example evaluation functions for common tasks such as code generation and math problem solving.
[`flaml.autogen.code_utils`](../reference/autogen/code_utils) and [`flaml.autogen.math_utils`](../reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.
### Metric to optimize