Hussein Mozannar e11d84b996
Adding Benchmarks to agbench (#3803)
* Move from tomllib to tomli

* added example code for magentic-one + code comments

* adding benchmarks temporarily

* add license for datasets

* revert changes to magentic-one

* change license location

---------

Co-authored-by: Ryan Sweet <rysweet@microsoft.com>
2024-10-18 06:33:33 +02:00

8 lines
573 B
Markdown

# WebArena Benchmark
This scenario implements the [WebArena](https://github.com/web-arena-x/webarena/tree/main) benchmark. The evaluation code has been modified from WebArena in [evaluation_harness](Templates/Common/evaluation_harness) we retain the License from WebArena and include it here [LICENSE](Templates/Common/evaluation_harness/LICENSE).
## References
Zhou, Shuyan, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng et al. "Webarena: A realistic web environment for building autonomous agents." arXiv preprint arXiv:2307.13854 (2023).