This pull request introduces a new linting feature to the benchmark
configuration in the `agbench` package. The main changes include adding
a new command to the CLI, implementing the linter functionality, and
integrating it with the existing codebase.
### New Linting Feature:
*
[`python/packages/agbench/src/agbench/cli.py`](diffhunk://#diff-0eafed70ad5e99e6f7319927bf92ee3ce4787d156dd2775b10a61baad7ec1799R10):
Added `lint_cli` import and integrated the new "lint" command into the
`main` function.
[[1]](diffhunk://#diff-0eafed70ad5e99e6f7319927bf92ee3ce4787d156dd2775b10a61baad7ec1799R10)
[[2]](diffhunk://#diff-0eafed70ad5e99e6f7319927bf92ee3ce4787d156dd2775b10a61baad7ec1799R37-R41)
### Linter Implementation:
*
[`python/packages/agbench/src/agbench/linter/__init__.py`](diffhunk://#diff-45842e728e3daad063b3cf84d5857a4fdfe14e6d977fb2054f284eb9f5bb5272R1-R4):
Added necessary imports to initialize the linter module.
*
[`python/packages/agbench/src/agbench/linter/_base.py`](diffhunk://#diff-f7ea2f6706232406b6c727fda6d71f09c568b4573f070af79bb7f3da3514e364R1-R81):
Defined core classes such as `Document`, `Code`, `CodeExample`,
`CodedDocument`, and the `BaseQualitativeCoder` protocol.
*
[`python/packages/agbench/src/agbench/linter/cli.py`](diffhunk://#diff-e6ad1e14dc0df2c10fe62fede5a06d83865ad1961f99ec2d78f9052feb4d663bR1-R86):
Implemented the `lint_cli` function, which includes loading log files,
coding them, and printing the results.
*
[`python/packages/agbench/src/agbench/linter/coders/oai_coder.py`](diffhunk://#diff-5059129410822c8a214f797a6167cbfcfbe31bd6a3b1efcb65a2dd703ef9b331R1-R212):
Implemented the `OAIQualitativeCoder` class to interact with OpenAI for
coding documents and caching results.
Example usage:
<img width="997" alt="image"
src="https://github.com/user-attachments/assets/6718688e-9917-4a43-a2f1-1105b030528d"
/>
<img width="999" alt="image"
src="https://github.com/user-attachments/assets/7fcb9c43-70f2-4fe7-ae29-5ad6a4ef2a16"
/>
> If you are in VSCode Terminal, you can click on the links in the
terminal output to jump to the exact error.
---------
Co-authored-by: afourney <adamfo@microsoft.com>
- Updated HumanEval template to use AgentChat
- Update templates to use config.yaml for model and other configuration
- Read environment from ENV.yaml (ENV.json still supported but
deprecated)
- Temporarily removed WebArena and AssistantBench. Neither had viable
Templates after `autogen_magentic_one` was removed. Templates need to be
update to AgentChat (in a future PR, but this PR is getting big enough
already)
This PR removes the older `autogen_magentic_one` package, and directs
people to use the new AgentChat implementation.
Hopefully this eases confusion.
---------
Co-authored-by: Jack Gerrits <jack@jackgerrits.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* Fix definition of workspace package, remove uv pin
* add --all-packages
* pin docs uv versions for older project structure
* try old version to verify CI
* Use workflow target
* change syntax
* change check
* try with var in matrix
* add all packages to workspace
* remove project table
1. convert dataclass types to pydantic basemodel
2. add save_state and load_state for ChatAgent
3. state types for AgentChat
---------
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
* add tests to ruff for core
* fmt
* lint
* lint fixes
* fixup more dirs
* dont include non python
* lint fixes
* lint fixes
* fix dir name
* dont relative include
* Migrate to uv and poe for workspace management and task running
* install python
* try fix
* ensure workspace venv in used
* package dir
* move nbqa to mypy task
* separate sync, clarify docs