* Add isort
* Apply isort on py files
* Fix circular import
* Fix format for notebooks
* Fix format
---------
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Prints the version of AutoGenBench from the command line, closing i1458
* Added autogenbench version to timestamp.txt
* Attempting to fix formatting.
* Add a gitignore for autogenbench
* Generalize to read all template dirs from Templates
* AutoGenBench logs telemetry when available.
* Remove spaces if present from template names.
* Bump version.
* Fixed formatting.
* Allow native warning to be skipped. Mount autogen repo in Docker if it can be found (experimental).
* Native execution now occurs in a venv.
* Bump version.
* Fixed a prompt escaping bug evident in GAIA task '6f37996b-2ac7-44b0-8e68-6d28256631b4'
* Updated all scenarios to use template discovery.
* Update with main version of runtime_logging.
* Better handling of Ctrl-C and cleanup of unused containers.
* Even stronger hinting that containers should be removed.
---------
Co-authored-by: gagb <gagb@users.noreply.github.com>
* Prints the version of AutoGenBench from the command line, closing i1458
* Added autogenbench version to timestamp.txt
* Attempting to fix formatting.
* Add a gitignore for autogenbench
* Generalize to read all template dirs from Templates
* AutoGenBench logs telemetry when available.
* Remove spaces if present from template names.
* Bump version.
* Fixed formatting.
* Allow native warning to be skipped. Mount autogen repo in Docker if it can be found (experimental).
* Native execution now occurs in a venv.
* Bump version.
* Fixed a prompt escaping bug evident in GAIA task '6f37996b-2ac7-44b0-8e68-6d28256631b4'
* Updated all scenarios to use template discovery.
* Update with main version of runtime_logging.
---------
Co-authored-by: gagb <gagb@users.noreply.github.com>
* Initial commit of AutoGenBench
* wording
* typo
* pre-commit reformulation
* Updated README to point to contributor's guide earlier.
* Simplified the description of the JSON format.
* Added print statements to indicate when run.sh and scenario.py are starting.
* Added SocietyOfMind scenario to GAIA.
* Pointing autogenbench clone command to the latest branch.
* Temporarily disable subsample option.
* Updated the GAIA readme to specify how to define a BING API key.
* Fixed and re-enabled the subsample option.
* Added a draft of a blog post.
* Updated authors.
* Incorporating Gagan's feedback.
* Fixed code formatting.
* Updated the help string in the docs.
* Light editing of the AutoGenBench blogpost.
* Support filtering on model tags.
* Added websurfer dependencies to Dockerfile.
* Renamed testbed -> autogenbench
* Attempting to fix formatting.
* Added more gracefull handling of task timeouts (the script is allowed to terminate before Docker is stopped).
* Updated the blogpost based on Saleema's and Julia's feedback.
* Fixed formatting... again.
* Added a main MANIFEST to list available scenarios.
* Limit main manifest to directories.
* Manifests now use relative paths.
* All manifests are now relative.
* Updated the contributing guide, and address windows path issues.
* Updated the version. Fixed formatting.
* Fixed formatting.
* De-listing Examples, since it has no clear tabulate criteria.
* Updated email in pyproject
* typo in blogpost
* wording
---------
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: Qingyun Wu <qingyun0327@gmail.com>
* fixed spelling, minor errors and reformatted using black
* polishing
* added codespell to pre-commit hooks, fixed a number of spelling errors and a few minor bugs in the code
* update autogen library version in notebooks
* update autogen library version in notebooks
* update autogen library version in notebooks
* update autogen library version in notebooks
* update autogen library version in notebooks
* fix: typo
* fix: typo
* fix: typo of function name
* fix: typo of function name of test file
* Update test_token_count.py
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* Re-added completion logging when using older versions of autogen.
* Extended scenario definitions and templating to include folders.
* Prepare collate_human_eval.py for working with group chat scenarios.
* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.
* Fixed the default termination message.
* Fixed another termination condition.
* Updated compatible autogen versions.
* Added initial support for GAIA benchmark.
* Fixed a bug in executing the finalize scripts.
* Generalized the template further to support multiple folder copy operations.
* Refined GAIA support, and broke scenarios down by difficulty.
* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.
* Added instructions for cloning GAIA
* Updated README to fix some typos.
* Added a script to format GAIA reslts for the leaderboard.
* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py
Co-authored-by: LeoLjl <3110503618@qq.com>
---------
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: LeoLjl <3110503618@qq.com>
* Re-added completion logging when using older versions of autogen.
* Extended scenario definitions and templating to include folders.
* Prepare collate_human_eval.py for working with group chat scenarios.
* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.
* Fixed the default termination message.
* Fixed another termination condition.
* Updated compatible autogen versions.
* Fixed a bug in executing the finalize scripts.
* Generalized the template further to support multiple folder copy operations.
* Add tests from AutoGPT.
* Update README.md
* Fix typo
* Update samples/tools/testbed/README.md
---------
Co-authored-by: LeoLjl <3110503618@qq.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
* Initial commit of the autogen testbed environment.
* Fixed some typos in the Testbed README.md
* Added some stricter termination logic to the two_agent scenario, and swiched the logo task from finding Autogen's logo, to finding Microsoft's (it's easier)
* Added documentation to testbed code in preparation for PR
* Added a variation of HumanEval to the Testbed. It is also a reasonable example of how to integrate other benchmarks.
* Removed ChatCompletion.start_logging and related features. Added an explicit TERMINATE output to HumanEval to save 1 turn in each conversation.
* Added metrics utils script for HumanEval
* Updated the requirements in the README.
* Added documentation for HumanEval csv schemas
* Standardized on how the OAI_CONFIG_LIST is handled.
* Removed dot-slash from 'includes' path for cross-platform compatibility
* Missed a file.
* Updated readme to include known-working versions.