* fix: typo
* fix: typo
* fix: typo of function name
* fix: typo of function name of test file
* Update test_token_count.py
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* Re-added completion logging when using older versions of autogen.
* Extended scenario definitions and templating to include folders.
* Prepare collate_human_eval.py for working with group chat scenarios.
* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.
* Fixed the default termination message.
* Fixed another termination condition.
* Updated compatible autogen versions.
* Added initial support for GAIA benchmark.
* Fixed a bug in executing the finalize scripts.
* Generalized the template further to support multiple folder copy operations.
* Refined GAIA support, and broke scenarios down by difficulty.
* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.
* Added instructions for cloning GAIA
* Updated README to fix some typos.
* Added a script to format GAIA reslts for the leaderboard.
* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py
Co-authored-by: LeoLjl <3110503618@qq.com>
---------
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: LeoLjl <3110503618@qq.com>