8 Commits

Author SHA1 Message Date
KazooTTT
a122ffe541
Fix/typo (#1034)
* fix: typo

* fix: typo

* fix: typo of function name

* fix: typo of function name of test file

* Update test_token_count.py

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
2023-12-22 16:00:46 +00:00
afourney
4dcb41531f
Allow users to specify the Docker image to use with Testbed (#986)
* Allow users to specify the Docker image to use (or build a good AutoGen default image if not specified).

* Added lmm and graphs to dockerfile
2023-12-15 20:37:22 +00:00
afourney
a107233e23
Testbed can now read the OPENAI_API_KEY in addition to the OAI_CONFIG_LIST (#848)
Co-authored-by: Victor Dibia <victordibia@microsoft.com>
2023-12-04 22:14:00 +00:00
afourney
45c2a78970
Testbed folders (#792)
* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Add tests from AutoGPT.

* Update README.md

* Fix typo

* Update samples/tools/testbed/README.md

---------

Co-authored-by: LeoLjl <3110503618@qq.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-11-30 16:43:03 +00:00
afourney
b0a6d72b8c
Addresses issue 635, relating to newlines in Windows. (#678) 2023-11-15 22:27:10 +00:00
afourney
72f488e4d7
Allows users to specify a different requirements.txt file to install in Docker, to test other versions or branches of Autogen. Closes #662 (#671) 2023-11-15 00:33:09 +00:00
afourney
c37453735a
Sets the umask before executing the task in Docker. (#593)
* Sets the umask before executing the task in Docker.

* Added version backward compatibility for disabling cache and setting timeouts.
2023-11-14 21:14:38 +00:00
afourney
1c4a5e6a1a
Added a simple Testbed tool for repeatedly running templated Autogen scenarios with tightly-controlled initial conditions. (#455)
* Initial commit of the autogen testbed environment.

* Fixed some typos in the Testbed README.md

* Added some stricter termination logic to the two_agent scenario, and swiched the logo task from finding Autogen's logo, to finding Microsoft's (it's easier)

* Added documentation to testbed code in preparation for PR

* Added a variation of HumanEval to the Testbed. It is also a reasonable example of how to integrate other benchmarks.

* Removed ChatCompletion.start_logging and related features. Added an explicit TERMINATE output to HumanEval to save 1 turn in each conversation.

* Added metrics utils script for HumanEval

* Updated the requirements in the README.

* Added documentation for HumanEval csv schemas

* Standardized on how the OAI_CONFIG_LIST is handled.

* Removed dot-slash from 'includes' path for cross-platform compatibility

* Missed a file.

* Updated readme to include known-working versions.
2023-11-04 10:38:43 +00:00