mirror of
https://github.com/microsoft/autogen.git
synced 2025-11-02 10:50:03 +00:00
Post release update (#985)
* news update * doc update * avoid KeyError * bump version to 1.2.1 * handle empty responses * typo * eval function
This commit is contained in:
parent
a701cd82f8
commit
c780d79004
@ -14,7 +14,7 @@
|
||||
<br>
|
||||
</p>
|
||||
|
||||
:fire: OpenAI GPT-3 models support in v1.1.3. ChatGPT and GPT-4 support will be added in v1.2.0.
|
||||
:fire: v1.2.0 is released with support for ChatGPT and GPT-4.
|
||||
|
||||
:fire: A [lab forum](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) on FLAML at AAAI 2023.
|
||||
|
||||
|
||||
@ -290,8 +290,16 @@ def eval_math_responses(responses, solution=None, **args):
|
||||
Returns:
|
||||
dict: The success metrics.
|
||||
"""
|
||||
success_list = []
|
||||
n = len(responses)
|
||||
if not n:
|
||||
return {
|
||||
"expected_success": 0,
|
||||
"success": False,
|
||||
"success_vote": 0,
|
||||
"voted_answer": None,
|
||||
"votes": 0,
|
||||
}
|
||||
success_list = []
|
||||
if solution is not None:
|
||||
for i in range(n):
|
||||
response = responses[i]
|
||||
|
||||
@ -843,7 +843,7 @@ class Completion:
|
||||
choices = response["choices"]
|
||||
if "text" in choices[0]:
|
||||
return [choice["text"] for choice in choices]
|
||||
return [choice["message"]["content"] for choice in choices]
|
||||
return [choice["message"].get("content", "") for choice in choices]
|
||||
|
||||
|
||||
class ChatCompletion(Completion):
|
||||
|
||||
@ -1 +1 @@
|
||||
__version__ = "1.2.0"
|
||||
__version__ = "1.2.1"
|
||||
|
||||
@ -216,6 +216,7 @@ def test_math(num_samples=-1):
|
||||
print("tuned config", config)
|
||||
result = oai.ChatCompletion.test(test_data_sample, config)
|
||||
print("result from tuned config:", result)
|
||||
print("empty responses", eval_math_responses([], None))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@ -56,7 +56,7 @@ test_data = [
|
||||
]
|
||||
```
|
||||
|
||||
### Defining the metric
|
||||
### Define the metric
|
||||
|
||||
Before starting tuning, you need to define the metric for the optimization. For each code generation task, we can use the model to generate multiple candidate responses, and then select one from them. If the final selected response can pass a unit test, we consider the task as successfully solved. Then we can define the average success rate on a collection of tasks as the optimization metric.
|
||||
|
||||
@ -69,7 +69,7 @@ eval_with_generated_assertions = partial(eval_function_completions, assertions=g
|
||||
|
||||
This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.
|
||||
|
||||
### Tuning Hyperparameters for OpenAI
|
||||
### Tune the hyperparameters
|
||||
|
||||
The tuning will be performed under the specified optimization budgets.
|
||||
|
||||
|
||||
@ -44,13 +44,13 @@ Collect a diverse set of instances. They can be stored in an iterable of dicts.
|
||||
The evaluation function should take a list of responses, and other keyword arguments corresponding to the keys in each validation data instance as input, and output a dict of metrics. For example,
|
||||
|
||||
```python
|
||||
def success_metrics(responses: List[str], problem: str, solution: str) -> Dict:
|
||||
def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
|
||||
# select a response from the list of responses
|
||||
# check whether the answer is correct
|
||||
return {"success": True or False}
|
||||
```
|
||||
|
||||
`flaml.autogen` offers some example evaluation functions for common tasks such as code generation and math problem solving.
|
||||
[`flaml.autogen.code_utils`](../reference/autogen/code_utils) and [`flaml.autogen.math_utils`](../reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.
|
||||
|
||||
### Metric to optimize
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user