From e69dd92c4febf80c77affd29a318bd76da75679e Mon Sep 17 00:00:00 2001 From: gagb Date: Tue, 16 Jul 2024 15:18:06 -0700 Subject: [PATCH] Improve team-one readme (#225) * Update readme * Improve readme further * Add results --- python/teams/team-one/readme.md | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/python/teams/team-one/readme.md b/python/teams/team-one/readme.md index d23271ea0..7cdd1a1b5 100644 --- a/python/teams/team-one/readme.md +++ b/python/teams/team-one/readme.md @@ -55,16 +55,30 @@ Team-One uses agents with the following personas and capabilities: ### Performance - Team-One currently achieves the following performance on complex agent benchmarks: -GAIA +_GAIA_ -TODO +| Level | Task Completion Rate* | +|-------|---------------------| +| Level 1 | 49% (26/53) | +| Level 2 | 26% (22/86) | +| Level 3 | 8% (2/26) | +| Total | 30% (50/165) | -WebArena +*Indicates the percentage of tasks completed successfully on the development set. -TODO +_WebArena_ + +| Site | Task Completion Rate | +|----------------|----------------| +| Reddit | 49%  (27/55) | +| Shopping | 23%  (22/96) | +| CMS | 16%  (16/101) | +| Gitlab | 41%  (32/79) | +| Maps | 35%  (23/65) | +| Multiple Sites | %  (--/26) | +| Total | 28%  (120/422) | # Setup