* fix: typo
* fix: typo
* fix: typo of function name
* fix: typo of function name of test file
* Update test_token_count.py
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* Re-added completion logging when using older versions of autogen.
* Extended scenario definitions and templating to include folders.
* Prepare collate_human_eval.py for working with group chat scenarios.
* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.
* Fixed the default termination message.
* Fixed another termination condition.
* Updated compatible autogen versions.
* Fixed a bug in executing the finalize scripts.
* Generalized the template further to support multiple folder copy operations.
* Add tests from AutoGPT.
* Update README.md
* Fix typo
* Update samples/tools/testbed/README.md
---------
Co-authored-by: LeoLjl <3110503618@qq.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
* Initial commit of the autogen testbed environment.
* Fixed some typos in the Testbed README.md
* Added some stricter termination logic to the two_agent scenario, and swiched the logo task from finding Autogen's logo, to finding Microsoft's (it's easier)
* Added documentation to testbed code in preparation for PR
* Added a variation of HumanEval to the Testbed. It is also a reasonable example of how to integrate other benchmarks.
* Removed ChatCompletion.start_logging and related features. Added an explicit TERMINATE output to HumanEval to save 1 turn in each conversation.
* Added metrics utils script for HumanEval
* Updated the requirements in the README.
* Added documentation for HumanEval csv schemas
* Standardized on how the OAI_CONFIG_LIST is handled.
* Removed dot-slash from 'includes' path for cross-platform compatibility
* Missed a file.
* Updated readme to include known-working versions.