mirror of https://github.com/microsoft/autogen.git synced 2025-12-15 17:18:30 +00:00

History

[BugFix][Refactor] Modular Transformer Pipeline and Fix Gemini/Anthropic Empty Content Handling (#6063 )

## Why are these changes needed?
This change addresses a compatibility issue when using Google Gemini
models with AutoGen. Specifically, Gemini returns a 400 INVALID_ARGUMENT
error when receiving a response with an empty "text" parameter.

The root cause is that Gemini does not accept empty string values (e.g.,
"") as valid inputs in the history of the conversation.

To fix this, if the content field is falsy (e.g., None, "", etc.), it is
explicitly replaced with a single whitespace (" "), which prevents the
Gemini model from rejecting the request.

- **Gemini API compatibility:** Gemini models reject empty assistant
messages (e.g., `""`), causing runtime errors. This PR ensures such
messages are safely replaced with whitespace where appropriate.
- **Avoiding regressions:** Applying the empty content workaround **only
to Gemini**, and **only to valid message types**, avoids breaking OpenAI
or other models.
- **Reducing duplication:** Previously, message transformation logic was
scattered and repeated across different message types and models.
Modularizing this pipeline removes that redundancy.
- **Improved maintainability:** With future model variants likely to
introduce more constraints, this modular structure makes it easier to
adapt transformations without writing ad-hoc code each time.
- **Testing for correctness:** The new structure is verified with tests,
ensuring the bug fix is effective and non-intrusive.

## Summary

This PR introduces a **modular transformer pipeline** for message
conversion and **fixes a Gemini-specific bug** related to empty
assistant message content.

### Key Changes

- **[Refactor]** Extracted message transformation logic into a unified
pipeline to:
  - Reduce code duplication
  - Improve maintainability
  - Simplify debugging and extension for future model-specific logic

- **[BugFix]** Gemini models do not accept empty assistant message
content.
- Introduced `_set_empty_to_whitespace` transformer to replace empty
strings with `" "` only where needed
- Applied it **only** to `"text"` and `"thought"` message types, not to
`"tools"` to avoid serialization errors

- **Improved structure for model-specific handling**
- Transformer functions are now grouped and conditionally applied based
on message type and model family
- This design makes it easier to support future models or combinations
(e.g., Gemini + R1)

- **Test coverage added**
- Added dedicated tests to verify that empty assistant content causes
errors for Gemini
  - Ensured the fix resolves the issue without affecting OpenAI models

---

## Motivation

Originally, Gemini-compatible endpoints would fail when receiving
assistant messages with empty content (`""`).
This issue required special handling without introducing brittle, ad-hoc
patches.

In addressing this, I also saw an opportunity to **modularize** the
message transformation logic across models.
This improves clarity, avoids duplication, and simplifies future
adaptations (e.g., different constraints across model families).

---


## 📘 AutoGen Modular Message Transformer: Design & Usage Guide

This document introduces the **new modular transformer system** used in
AutoGen for converting `LLMMessage` instances to SDK-specific message
formats (e.g., OpenAI-style `ChatCompletionMessageParam`).
The design improves **reusability, extensibility**, and
**maintainability** across different model families.

---

### 🚀 Overview

Instead of scattering model-specific message conversion logic across the
codebase, the new design introduces:

- Modular transformer **functions** for each message type
- Per-model **transformer maps** (e.g., for OpenAI-compatible models)
- Optional **conditional transformers** for multimodal/text hybrid
models
- Clear separation between **message adaptation logic** and
**SDK-specific builder** (e.g., `ChatCompletionUserMessageParam`)

---

### 🧱 1. Define Transform Functions

Each transformer function takes:
- `LLMMessage`: a structured AutoGen message
- `context: dict`: metadata passed through the builder pipeline

And returns:
- A dictionary of keyword arguments for the target message constructor
(e.g., `{"content": ..., "name": ..., "role": ...}`)

```python
def _set_thought_as_content_gemini(message: LLMMessage, context: Dict[str, Any]) -> Dict[str, str | None]:
    assert isinstance(message, AssistantMessage)
    return {"content": message.thought or " "}
```

---

### 🪢 2. Compose Transformer Pipelines

Multiple transformer functions are composed into a pipeline using
`build_transformer_func()`:

```python
base_user_transformer_funcs: List[Callable[[LLMMessage, Dict[str, Any]], Dict[str, Any]]] = [
    _assert_valid_name,
    _set_name,
    _set_role("user"),
]

user_transformer = build_transformer_func(
    funcs=base_user_transformer_funcs,
    message_param_func=ChatCompletionUserMessageParam
)
```

- The `message_param_func` is the actual constructor for the target
message class (usually from the SDK).
- The pipeline is **ordered** — each function adds or overrides keys in
the builder kwargs.

---

### 🗂️ 3. Register Transformer Map

Each model family maintains a `TransformerMap`, which maps `LLMMessage`
types to transformers:

```python
__BASE_TRANSFORMER_MAP: TransformerMap = {
    SystemMessage: system_transformer,
    UserMessage: user_transformer,
    AssistantMessage: assistant_transformer,
}

register_transformer("openai", model_name_or_family, __BASE_TRANSFORMER_MAP)
```

- `"openai"` is currently required (as only OpenAI-compatible format is
supported now).
- Registration ensures AutoGen knows how to transform each message type
for that model.

---

### 🔁 4. Conditional Transformers (Optional)

When message construction depends on runtime conditions (e.g., `"text"`
vs. `"multimodal"`), use:

```python
conditional_transformer = build_conditional_transformer_func(
    funcs_map=user_transformer_funcs_claude,
    message_param_func_map=user_transformer_constructors,
    condition_func=user_condition,
)
```

Where:

- `funcs_map`: maps condition label → list of transformer functions
```python
user_transformer_funcs_claude = {
    "text": text_transformers + [_set_empty_to_whitespace],
    "multimodal": multimodal_transformers + [_set_empty_to_whitespace],
}
```

- `message_param_func_map`: maps condition label → message builder
```python
user_transformer_constructors = {
    "text": ChatCompletionUserMessageParam,
    "multimodal": ChatCompletionUserMessageParam,
}
```

- `condition_func`: determines which transformer to apply at runtime
```python
def user_condition(message: LLMMessage, context: Dict[str, Any]) -> str:
    if isinstance(message.content, str):
        return "text"
    return "multimodal"
```

---

### 🧪 Example Flow

```python
llm_message = AssistantMessage(name="a", thought="let’s go")
model_family = "openai"
model_name = "claude-3-opus"

transformer = get_transformer(model_family, model_name, type(llm_message))
sdk_message = transformer(llm_message, context={})
```

---

### 🎯 Design Benefits

| Feature | Benefit |
|--------|---------|
| 🧱 Function-based modular design | Easy to compose and test |
| 🧩 Per-model registry | Clean separation across model families |
| ⚖️ Conditional support | Allows multimodal / dynamic adaptation |
| 🔄 Reuse-friendly | Shared logic (e.g., `_set_name`) is DRY |
| 📦 SDK-specific | Keeps message adaptation aligned to builder interface
|

---

### 🔮 Future Direction

- Support more SDKs and formats by introducing new message_param_func
- Global registry integration (currently `"openai"`-scoped)
- Class-based transformer variant if complexity grows



---

## Related issue number
Closes #5762

## Checks

- [ ] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ v ] I've made sure all auto checks have passed.

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>

2025-03-30 21:09:30 -07:00

docs

Rename to use BaseChatMessage and BaseAgentEvent. Bring back union types. (#6144 )

2025-03-30 09:34:40 -07:00

src/autogen_core

[BugFix][Refactor] Modular Transformer Pipeline and Fix Gemini/Anthropic Empty Content Handling (#6063 )

2025-03-30 21:09:30 -07:00

tests

Fix token limited model context (#6137 )

2025-03-28 17:24:41 +00:00

.gitignore

API documentation page flattening (#4556 )

2024-12-04 18:24:07 -08:00

LICENSE-CODE

Include license file in package (#3703 )

2024-10-09 15:01:09 -04:00

pyproject.toml

update version to v0.4.9 (#5903 )

2025-03-11 19:35:22 -07:00

README.md

Make package readmes slightly less empty (#4961 )

2025-01-09 10:44:13 -08:00

README.md

AutoGen Core

Documentation

AutoGen core offers an easy way to quickly build event-driven, distributed, scalable, resilient AI agent systems. Agents are developed by using the Actor model. You can build and run your agent system locally and easily move to a distributed system in the cloud when you are ready.