mirror of
https://github.com/microsoft/autogen.git
synced 2025-11-02 10:50:03 +00:00
1. Add host network support in Docker and remove unused requirements
from argument check.
2. Use Pandas to simplify summary statistic calculations.
3. Add running time to summary statistics
```
Using tabulation method defined in '/home/ekzhu/autogen/python/packages/agbench/benchmarks/HumanEval/Scripts/custom_tabulate.py'
Task Id Trial 0 Success Trial 0 Time
-- ------------ ----------------- --------------
0 HumanEval_0 True 3
1 HumanEval_1 False 15
2 HumanEval_2 True 2
3 HumanEval_3 True 11
4 HumanEval_4 True 4
5 HumanEval_5 True 2
6 HumanEval_6 False 18
7 HumanEval_7 True 2
8 HumanEval_8 True 2
9 HumanEval_9 True 12
10 HumanEval_10 False 11
11 HumanEval_11 True 2
12 HumanEval_12 True 3
13 HumanEval_13 True 1
14 HumanEval_14 True 4
15 HumanEval_15 True 1
16 HumanEval_16 True 2
17 HumanEval_17 False 76
18 HumanEval_18 True 4
19 HumanEval_19 True 3
20 HumanEval_20 True 5
21 HumanEval_21 True 3
22 HumanEval_22 True 1
23 HumanEval_23 True 2
24 HumanEval_24 nan
Summary Statistics
Successes Failures Missing Total Average Success Rate Average Time Total Time
------- ----------- ---------- --------- ------- ---------------------- -------------- ------------
Trial 0 20 4 1 25 0.8 7.875 189
CAUTION: 'autogenbench tabulate' is in early preview and is not thoroughly tested.
Please do not cite values from these calculations in academic work without first inspecting and verifying the results in the run logs yourself.
```
Now the default tabulate output looks like this
---------
Co-authored-by: Ryan Sweet <rysweet@microsoft.com>