Hussein Mozannar e11d84b996
Adding Benchmarks to agbench (#3803)
* Move from tomllib to tomli

* added example code for magentic-one + code comments

* adding benchmarks temporarily

* add license for datasets

* revert changes to magentic-one

* change license location

---------

Co-authored-by: Ryan Sweet <rysweet@microsoft.com>
2024-10-18 06:33:33 +02:00

573 B

WebArena Benchmark

This scenario implements the WebArena benchmark. The evaluation code has been modified from WebArena in evaluation_harness we retain the License from WebArena and include it here LICENSE.

References

Zhou, Shuyan, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng et al. "Webarena: A realistic web environment for building autonomous agents." arXiv preprint arXiv:2307.13854 (2023).