mirror of
https://github.com/microsoft/autogen.git
synced 2025-08-10 17:51:22 +00:00

* fix: typo * fix: typo * fix: typo of function name * fix: typo of function name of test file * Update test_token_count.py --------- Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
103 lines
6.3 KiB
Plaintext
103 lines
6.3 KiB
Plaintext
---
|
||
title: EcoAssistant - Using LLM Assistants More Accurately and Affordably
|
||
authors: jieyuz2
|
||
tags: [LLM, RAG, cost-effectiveness]
|
||
---
|
||
|
||

|
||
|
||
**TL;DR:**
|
||
* Introducing the **EcoAssistant**, which is designed to solve user queries more accurately and affordably.
|
||
* We show how to let the LLM assistant agent leverage external API to solve user query.
|
||
* We show how to reduce the cost of using GPT models via **Assistant Hierarchy**.
|
||
* We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via **Solution Demonstration**.
|
||
|
||
|
||
## EcoAssistant
|
||
|
||
In this blog, we introduce the **EcoAssistant**, a system built upon AutoGen with the goal of solving user queries more accurately and affordably.
|
||
|
||
### Problem setup
|
||
|
||
Recently, users have been using conversational LLMs such as ChatGPT for various queries.
|
||
Reports indicate that 23% of ChatGPT user queries are for knowledge extraction purposes.
|
||
Many of these queries require knowledge that is external to the information stored within any pre-trained large language models (LLMs).
|
||
These tasks can only be completed by generating code to fetch necessary information via external APIs that contain the requested information.
|
||
In the table below, we show three types of user queries that we aim to address in this work.
|
||
|
||
| Dataset | API | Example query |
|
||
|-------------|----------|----------|
|
||
| Places| [Google Places](https://developers.google.com/maps/documentation/places/web-service/overview) | I’m looking for a 24-hour pharmacy in Montreal, can you find one for me? |
|
||
| Weather | [Weather API](https://www.weatherapi.com) | What is the current cloud coverage in Mumbai, India? |
|
||
| Stock | [Alpha Vantage Stock API](https://www.alphavantage.co/documentation/) | Can you give me the opening price of Microsoft for the month of January 2023? |
|
||
|
||
|
||
### Leveraging external APIs
|
||
|
||
To address these queries, we first build a **two-agent system** based on AutoGen,
|
||
where the first agent is a **LLM assistant agent** (`AssistantAgent` in AutoGen) that is responsible for proposing and refining the code and
|
||
the second agent is a **code executor agent** (`UserProxyAgent` in AutoGen) that would extract the generated code and execute it, forwarding the output back to the LLM assistant agent.
|
||
A visualization of the two-agent system is shown below.
|
||
|
||

|
||
|
||
To instruct the assistant agent to leverage external APIs, we only need to add the API name/key dictionary at the beginning of the initial message.
|
||
The template is shown below, where the red part is the information of APIs and black part is user query.
|
||
|
||

|
||
|
||
Importantly, we don't want to reveal our real API key to the assistant agent for safety concerns.
|
||
Therefore, we use a **fake API key** to replace the real API key in the initial message.
|
||
In particular, we generate a random token (e.g., `181dbb37`) for each API key and replace the real API key with the token in the initial message.
|
||
Then, when the code executor execute the code, the fake API key would be automatically replaced by the real API key.
|
||
|
||
|
||
### Solution Demonstration
|
||
In most practical scenarios, queries from users would appear sequentially over time.
|
||
Our **EcoAssistant** leverages past success to help the LLM assistants address future queries via **Solution Demonstration**.
|
||
Specifically, whenever a query is deemed successfully resolved by user feedback, we capture and store the query and the final generated code snippet.
|
||
These query-code pairs are saved in a specialized vector database. When new queries appear, **EcoAssistant** retrieves the most similar query from the database, which is then appended with the associated code to the initial prompt for the new query, serving as a demonstration.
|
||
The new template of initial message is shown below, where the blue part corresponds to the solution demonstration.
|
||
|
||

|
||
|
||
We found that this utilization of past successful query-code pairs improves the query resolution process with fewer iterations and enhances the system's performance.
|
||
|
||
|
||
### Assistant Hierarchy
|
||
LLMs usually have different prices and performance, for example, GPT-3.5-turbo is much cheaper than GPT-4 but also less accurate.
|
||
Thus, we propose the **Assistant Hierarchy** to reduce the cost of using LLMs.
|
||
The core idea is that we use the cheaper LLMs first and only use the more expensive LLMs when necessary.
|
||
By this way, we are able to reduce the reliance on expensive LLMs and thus reduce the cost.
|
||
In particular, given multiple LLMs, we initiate one assistant agent for each and start the conversation with the most cost-effective LLM assistant.
|
||
If the conversation between the current LLM assistant and the code executor concludes without successfully resolving the query, **EcoAssistant** would then restart the conversation with the next more expensive LLM assistant in the hierarchy.
|
||
We found that this strategy significantly reduces costs while still effectively addressing queries.
|
||
|
||
### A Synergistic Effect
|
||
We found that the **Assistant Hierarchy** and **Solution Demonstration** of **EcoAssistant** have a synergistic effect.
|
||
Because the query-code database is shared by all LLM assistants, even without specialized design,
|
||
the solution from more powerful LLM assistant (e.g., GPT-4) could be later retrieved to guide weaker LLM assistant (e.g., GPT-3.5-turbo).
|
||
Such a synergistic effect further improves the performance and reduces the cost of **EcoAssistant**.
|
||
|
||
### Experimental Results
|
||
|
||
We evaluate **EcoAssistant** on three datasets: Places, Weather, and Stock. When comparing it with a single GPT-4 assistant, we found that **EcoAssistant** achieves a higher success rate with a lower cost as shown in the figure below.
|
||
For more details about the experimental results and other experiments, please refer to our [paper](https://arxiv.org/abs/2310.03046).
|
||
|
||

|
||
|
||
## Further reading
|
||
|
||
Please refer to our [paper](https://arxiv.org/abs/2310.03046) and [codebase](https://github.com/JieyuZ2/EcoAssistant) for more details about **EcoAssistant**.
|
||
|
||
If you find this blog useful, please consider citing:
|
||
|
||
```bibtex
|
||
@article{zhang2023ecoassistant,
|
||
title={EcoAssistant: Using LLM Assistant More Affordably and Accurately},
|
||
author={Zhang, Jieyu and Krishna, Ranjay and Awadallah, Ahmed H and Wang, Chi},
|
||
journal={arXiv preprint arXiv:2310.03046},
|
||
year={2023}
|
||
}
|
||
```
|