autogen

mirror of https://github.com/microsoft/autogen.git synced 2025-08-11 02:01:10 +00:00

Author	SHA1	Message	Date
Yiran Wu	aa946b3507	Add MATH tests to testbed (#914 ) * add MATH eval to testbed * update --------- Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>	2023-12-18 14:37:28 +00:00
LeoLjl	2ee944df37	Add collate file and more tests from autogpt into testbed (#915 ) * Add collate file. * Add requirements.txt, Fix typo, Add tests * More tests. * Update check.py * Update scenario.py * Update prepare_autogpt.py * Update prepare_autogpt.py * More tasks for testset. * Add more tests. * Update docs. * Optimize file organize.	2023-12-14 16:26:30 +00:00
afourney	45c2a78970	Testbed folders (#792 ) * Re-added completion logging when using older versions of autogen. * Extended scenario definitions and templating to include folders. * Prepare collate_human_eval.py for working with group chat scenarios. * Converted HumanEval to the folder-based approach, and added GroupChat scenarios. * Fixed the default termination message. * Fixed another termination condition. * Updated compatible autogen versions. * Fixed a bug in executing the finalize scripts. * Generalized the template further to support multiple folder copy operations. * Add tests from AutoGPT. * Update README.md * Fix typo * Update samples/tools/testbed/README.md --------- Co-authored-by: LeoLjl <3110503618@qq.com> Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>	2023-11-30 16:43:03 +00:00
afourney	f790109271	Re-added completion logging when using older versions of autogen. (#701 )	2023-11-18 17:11:25 +00:00
afourney	72f488e4d7	Allows users to specify a different requirements.txt file to install in Docker, to test other versions or branches of Autogen. Closes #662 (#671 )	2023-11-15 00:33:09 +00:00
afourney	c37453735a	Sets the umask before executing the task in Docker. (#593 ) * Sets the umask before executing the task in Docker. * Added version backward compatibility for disabling cache and setting timeouts.	2023-11-14 21:14:38 +00:00
afourney	1c4a5e6a1a	Added a simple Testbed tool for repeatedly running templated Autogen scenarios with tightly-controlled initial conditions. (#455 ) * Initial commit of the autogen testbed environment. * Fixed some typos in the Testbed README.md * Added some stricter termination logic to the two_agent scenario, and swiched the logo task from finding Autogen's logo, to finding Microsoft's (it's easier) * Added documentation to testbed code in preparation for PR * Added a variation of HumanEval to the Testbed. It is also a reasonable example of how to integrate other benchmarks. * Removed ChatCompletion.start_logging and related features. Added an explicit TERMINATE output to HumanEval to save 1 turn in each conversation. * Added metrics utils script for HumanEval * Updated the requirements in the README. * Added documentation for HumanEval csv schemas * Standardized on how the OAI_CONFIG_LIST is handled. * Removed dot-slash from 'includes' path for cross-platform compatibility * Missed a file. * Updated readme to include known-working versions.	2023-11-04 10:38:43 +00:00

7 Commits