27 Commits

Author SHA1 Message Date
Eric Zhu
d5068d9b6c
update contact information on the repo and release package (#3383)
* update contact information on the repo and release package

* update contact

* update

* fix format
2024-08-20 02:04:44 +00:00
Eduardo Salinas
ebde196d6b
feat: add event logging api and more tracing (#2478)
* feat: add event logging api and more tracing

* code fmt shenanigans

* fixup

* Update test_agent_logging.py

* Update test_agent_logging.py

* Update test_agent_logging.py

* Update sqlite_logger.py

* Update test_agent_logging.py

* Update sqlite_logger.py

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2024-04-23 22:27:47 +00:00
Li Jiang
42b27b9a9d
Add isort (#2265)
* Add isort

* Apply isort on py files

* Fix circular import

* Fix format for notebooks

* Fix format

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2024-04-05 02:26:06 +00:00
afourney
061a857b3d
AutoGenBench: Handle Ctrl-C more gracefully. (#2174)
* Prints the version of AutoGenBench from the command line, closing i1458

* Added autogenbench version to timestamp.txt

* Attempting to fix formatting.

* Add a gitignore for autogenbench

* Generalize to read all template dirs from Templates

* AutoGenBench logs telemetry when available.

* Remove spaces if present from template names.

* Bump version.

* Fixed formatting.

* Allow native warning to be skipped. Mount autogen repo in Docker if it can be found (experimental).

* Native execution now occurs in a venv.

* Bump version.

* Fixed a prompt escaping bug evident in GAIA task '6f37996b-2ac7-44b0-8e68-6d28256631b4'

* Updated all scenarios to use template discovery.

* Update with main version of runtime_logging.

* Better handling of Ctrl-C and cleanup of unused containers.

* Even stronger hinting that containers should be removed.

---------

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-03-30 01:16:41 +00:00
olgavrou
af9b300be3
add webarena in samples (#2114)
* add webarena in samples/tools

* Update samples/tools/webarena/README.md

Co-authored-by: gagb <gagb@users.noreply.github.com>

* Update samples/tools/webarena/README.md

Co-authored-by: gagb <gagb@users.noreply.github.com>

* Update samples/tools/webarena/README.md

Co-authored-by: gagb <gagb@users.noreply.github.com>

* update installation instructions

* black formatting

* Update README.md

---------

Co-authored-by: gagb <gagb@users.noreply.github.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
2024-03-25 17:43:30 +00:00
Davor Runje
b1839c3845
Update pre-commit (#2067)
* update pre-commit

* update pre-commit.ci

* lint fix
2024-03-19 02:55:37 +00:00
afourney
3a5dd361b9
Bump autogenbench version. (#2027) 2024-03-15 14:26:09 +00:00
Eduardo Salinas
6dbae0a88b
fix: [autogenbench] writing to stdout encoding error in win-os (#2002) 2024-03-14 15:45:21 +00:00
Eduardo Salinas
a814ba54de
fix: [autogenbench] windows fails unless we specify encoding (#1957) 2024-03-12 21:28:16 +00:00
olgavrou
ce71d85e77
Ability to fine tune custom model on conversable agents (#1787)
* uAbility to update_model on conversable agents

* formatting

* formatting

* move code from conversable agent into samples/tools and add testing and README

* forgot install step

* fix

* leave core lib unchanged and move everything to samples/tools

* remove skip openai

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
2024-03-11 21:26:53 +00:00
Yiran Wu
2503000c22
update (#1891) 2024-03-07 15:02:48 +00:00
afourney
085bf6cf3d
Version 0.0.2 of Autogenbench (#1548)
* Prints the version of AutoGenBench from the command line, closing i1458

* Added autogenbench version to timestamp.txt

* Attempting to fix formatting.

* Add a gitignore for autogenbench

* Generalize to read all template dirs from Templates

* AutoGenBench logs telemetry when available.

* Remove spaces if present from template names.

* Bump version.

* Fixed formatting.

* Allow native warning to be skipped. Mount autogen repo in Docker if it can be found (experimental).

* Native execution now occurs in a venv.

* Bump version.

* Fixed a prompt escaping bug evident in GAIA task '6f37996b-2ac7-44b0-8e68-6d28256631b4'

* Updated all scenarios to use template discovery.

* Update with main version of runtime_logging.

---------

Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-02-24 18:12:57 +00:00
afourney
b10e065456
Bump autogenbench version. (#1485) 2024-01-31 21:32:41 +00:00
afourney
cd199c7ab7
Introduces AutoGenBench (#1048)
* Initial commit of AutoGenBench

* wording

* typo

* pre-commit reformulation

* Updated README to point to contributor's guide earlier.

* Simplified the description of the JSON format.

* Added print statements to indicate when run.sh and scenario.py are starting.

* Added SocietyOfMind scenario to GAIA.

* Pointing autogenbench clone command to the latest branch.

* Temporarily disable subsample option.

* Updated the GAIA readme to specify how to define a BING API key.

* Fixed and re-enabled the subsample option.

* Added a draft of a blog post.

* Updated authors.

* Incorporating Gagan's feedback.

* Fixed code formatting.

* Updated the help string in the docs.

* Light editing of the AutoGenBench blogpost.

* Support filtering on model tags.

* Added websurfer dependencies to Dockerfile.

* Renamed testbed -> autogenbench

* Attempting to fix formatting.

* Added more gracefull handling of task timeouts (the script is allowed to terminate before Docker is stopped).

* Updated the blogpost based on Saleema's and Julia's feedback.

* Fixed formatting... again.

* Added a main MANIFEST to list available scenarios.

* Limit main manifest to directories.

* Manifests now use relative paths.

* All manifests are now relative.

* Updated the contributing guide, and address windows path issues.

* Updated the version. Fixed formatting.

* Fixed formatting.

* De-listing Examples, since it has no clear tabulate criteria.

* Updated email in pyproject

* typo in blogpost

* wording

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: Qingyun Wu <qingyun0327@gmail.com>
2024-01-26 00:46:58 +00:00
Davor Runje
8f065e06e4
Add codespell to pre-commit hooks and fix spelling of existing files (#1161)
* fixed spelling, minor errors and reformatted using black

* polishing

* added codespell to pre-commit hooks, fixed a number of spelling errors and a few minor bugs in the code

* update autogen library version in notebooks

* update autogen library version in notebooks

* update autogen library version in notebooks

* update autogen library version in notebooks

* update autogen library version in notebooks
2024-01-07 01:41:33 +00:00
KazooTTT
a122ffe541
Fix/typo (#1034)
* fix: typo

* fix: typo

* fix: typo of function name

* fix: typo of function name of test file

* Update test_token_count.py

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
2023-12-22 16:00:46 +00:00
Yiran Wu
aa946b3507
Add MATH tests to testbed (#914)
* add MATH eval to testbed

* update

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-12-18 14:37:28 +00:00
afourney
4dcb41531f
Allow users to specify the Docker image to use with Testbed (#986)
* Allow users to specify the Docker image to use (or build a good AutoGen default image if not specified).

* Added lmm and graphs to dockerfile
2023-12-15 20:37:22 +00:00
LeoLjl
2ee944df37
Add collate file and more tests from autogpt into testbed (#915)
* Add collate file.

* Add requirements.txt, Fix typo, Add tests

* More tests.

* Update check.py

* Update scenario.py

* Update prepare_autogpt.py

* Update prepare_autogpt.py

* More tasks for testset.

* Add more tests.

* Update docs.

* Optimize file organize.
2023-12-14 16:26:30 +00:00
afourney
f8b4b4259b
Adds the GAIA benchark to the Testbed. This PR depends on #792 (#810)
* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Added initial support for GAIA benchmark.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Refined GAIA support, and broke scenarios down by difficulty.

* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.

* Added instructions for cloning GAIA

* Updated README to fix some typos.

* Added a script to format GAIA reslts for the leaderboard.

* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py

Co-authored-by: LeoLjl <3110503618@qq.com>

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: LeoLjl <3110503618@qq.com>
2023-12-06 01:46:10 +00:00
afourney
a107233e23
Testbed can now read the OPENAI_API_KEY in addition to the OAI_CONFIG_LIST (#848)
Co-authored-by: Victor Dibia <victordibia@microsoft.com>
2023-12-04 22:14:00 +00:00
afourney
45c2a78970
Testbed folders (#792)
* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Add tests from AutoGPT.

* Update README.md

* Fix typo

* Update samples/tools/testbed/README.md

---------

Co-authored-by: LeoLjl <3110503618@qq.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-11-30 16:43:03 +00:00
afourney
f790109271
Re-added completion logging when using older versions of autogen. (#701) 2023-11-18 17:11:25 +00:00
afourney
b0a6d72b8c
Addresses issue 635, relating to newlines in Windows. (#678) 2023-11-15 22:27:10 +00:00
afourney
72f488e4d7
Allows users to specify a different requirements.txt file to install in Docker, to test other versions or branches of Autogen. Closes #662 (#671) 2023-11-15 00:33:09 +00:00
afourney
c37453735a
Sets the umask before executing the task in Docker. (#593)
* Sets the umask before executing the task in Docker.

* Added version backward compatibility for disabling cache and setting timeouts.
2023-11-14 21:14:38 +00:00
afourney
1c4a5e6a1a
Added a simple Testbed tool for repeatedly running templated Autogen scenarios with tightly-controlled initial conditions. (#455)
* Initial commit of the autogen testbed environment.

* Fixed some typos in the Testbed README.md

* Added some stricter termination logic to the two_agent scenario, and swiched the logo task from finding Autogen's logo, to finding Microsoft's (it's easier)

* Added documentation to testbed code in preparation for PR

* Added a variation of HumanEval to the Testbed. It is also a reasonable example of how to integrate other benchmarks.

* Removed ChatCompletion.start_logging and related features. Added an explicit TERMINATE output to HumanEval to save 1 turn in each conversation.

* Added metrics utils script for HumanEval

* Updated the requirements in the README.

* Added documentation for HumanEval csv schemas

* Standardized on how the OAI_CONFIG_LIST is handled.

* Removed dot-slash from 'includes' path for cross-platform compatibility

* Missed a file.

* Updated readme to include known-working versions.
2023-11-04 10:38:43 +00:00