* Re-added completion logging when using older versions of autogen.
* Extended scenario definitions and templating to include folders.
* Prepare collate_human_eval.py for working with group chat scenarios.
* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.
* Fixed the default termination message.
* Fixed another termination condition.
* Updated compatible autogen versions.
* Added initial support for GAIA benchmark.
* Fixed a bug in executing the finalize scripts.
* Generalized the template further to support multiple folder copy operations.
* Refined GAIA support, and broke scenarios down by difficulty.
* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.
* Added instructions for cloning GAIA
* Updated README to fix some typos.
* Added a script to format GAIA reslts for the leaderboard.
* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py
Co-authored-by: LeoLjl <3110503618@qq.com>
---------
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: LeoLjl <3110503618@qq.com>