Victor Dibia 32d2a18bf1
[Draft] Enable File Upload/Paste as Task in AGS (#6091)
<!-- Thank you for your contribution! Please review
https://microsoft.github.io/autogen/docs/Contribute before opening a
pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?



https://github.com/user-attachments/assets/e160f16d-f42d-49e2-a6c6-687e4e6786f4



Enable file upload/paste as a task in AGS. Enables tasks like

- Can you research and fact check the ideas in this screenshot?
- Summarize this file

Only text and images supported for now
Underneath, it constructs TextMessage and Multimodal messages as the
task.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

Closes #5773 

## Checks

- [ ] I've included any doc changes needed for
<https://microsoft.github.io/autogen/>. See
<https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to
build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes
introduced in this PR.
- [ ] I've made sure all auto checks have passed.

---------

Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>
2025-04-09 02:44:45 +00:00

72 lines
2.9 KiB
Python

import base64
from typing import Sequence
from autogen_agentchat.messages import ChatMessage, MultiModalMessage, TextMessage
from autogen_core import Image
from autogen_core.models import UserMessage
from loguru import logger
def construct_task(query: str, files: list[dict] | None = None) -> Sequence[ChatMessage]:
"""
Construct a task from a query string and list of files.
Returns a list of ChatMessage objects suitable for processing by the agent system.
Args:
query: The text query from the user
files: List of file objects with properties name, content, and type
Returns:
List of BaseChatMessage objects (TextMessage, MultiModalMessage)
"""
if files is None:
files = []
messages = []
# Add the user's text query as a TextMessage
if query:
messages.append(TextMessage(source="user", content=query))
# Process each file based on its type
for file in files:
try:
if file.get("type", "").startswith("image/"):
# Handle image file using from_base64 method
# The content is already base64 encoded according to the convertFilesToBase64 function
image = Image.from_base64(file["content"])
messages.append(
MultiModalMessage(
source="user", content=[image], metadata={"filename": file.get("name", "unknown.img")}
)
)
elif file.get("type", "").startswith("text/"):
# Handle text file as TextMessage
text_content = base64.b64decode(file["content"]).decode("utf-8")
messages.append(
TextMessage(
source="user", content=text_content, metadata={"filename": file.get("name", "unknown.txt")}
)
)
else:
# Log unsupported file types but still try to process based on best guess
logger.warning(f"Potentially unsupported file type: {file.get('type')} for file {file.get('name')}")
if file.get("type", "").startswith("application/"):
# Try to treat as text if it's an application type (like JSON)
text_content = base64.b64decode(file["content"]).decode("utf-8")
messages.append(
TextMessage(
source="user",
content=text_content,
metadata={
"filename": file.get("name", "unknown.file"),
"filetype": file.get("type", "unknown"),
},
)
)
except Exception as e:
logger.error(f"Error processing file {file.get('name')}: {str(e)}")
# Continue processing other files even if one fails
return messages