* draft proposal * add link to colab notebook (api keys required) * Add alternative name ideas for MRKLAgent * Breakdown of agent steps * Added more sections * Add even more sections * simplify tool/action mentions, shorten * agents as new abstraction instead of BaseComponent * agent tools can be pipelines or nodes --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
19 KiB
- Title: MRKLAgent
- Decision driver: @julian-risch (in close collaboration with @vblagoje )
- Start Date: 2023-01-27
- Proposal PR: https://github.com/deepset-ai/haystack/pull/3925
- Github Issue or Discussion: https://github.com/deepset-ai/haystack/issues/3753
Summary
The Agent class answers queries by choosing between different tools, which are implemented as pipelines or nodes. It uses a large language model (LLM) to generate a thought based on the query, choose a tool, and generate the input for the tool. Based on the result returned by an action/tool (used interchangeably), the Agent has two options. It can either stop if it knows the answer now or repeat the process of 1) thought, 2) action choice, 3) action input.
The Agent can be used for questions containing multiple subquestions that can be answered step-by-step (Multihop QA). Combined with tools like the PythonRuntime or SerpAPIComponent we imagine for Haystack, the Agent can query the web and do calculations.
We have a notebook that demonstrates how to use an Agent with two tools: PythonRuntime and SerpAPIComponent. It requires API keys for OpenAI and SerpAPI. The notebook is based on the branch https://github.com/deepset-ai/haystack/compare/main...mrkl-pipeline (no pull request)
Basic example
An example of an Agent could use two tools: a web search engine and a calculator.
The query "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?" can be broken down into three steps:
- Searching the web for the name of Olivia Wilde's boyfriend
- Searching the web for the age of that boyfriend
- Calculating that age raised to the 0.23 power
And the Agent would respond in the end with "Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896." A detailed walk-through follows below.
Motivation
With an Agent, users can combine multiple LLMs and tools, so that they can build a truly powerful app. They can use an LLM in a loop to answer more complex questions than with ExtractiveQA or GenerativeQA. With an Agent and a tool for web search, Haystack is not limited to extracting answers from a document store or generating answers based on model weights anymore but it can use the knowledge it retrieves on-the-fly from the web. Thereby, the model's knowledge does not get outdated.
In future, we envision that an Agent could use tools not only for retrieving knowledge but also for interacting with the world. For example, it could periodically skim through newly opened issues in Haystack's GitHub repository. If there is a question that can be answered based on documentation then the Agent could retrieve relevant pages from the documentation, generate an answer and post it as a first response to the issue.
Detailed design
Glossary
- Thought: First part of a prompt generated by Agent that serves to breakdown the query into a plan, for example, what part of the question needs to be answered first.
- Action (or tool): Actions/tools are Haystack pipelines or nodes that the Agent can use to answer a question. We use tool and action interchangeably in this proposal until we decided on the best naming. Choosing a tool in each iteration is the center part of a prompt generated by Agent.
- Action input: Last part of a prompt generated by an Agent. It serves as the input to a tool that the Agent uses to answer a question.
- Observation: The output generated by a tool and sent back to the Agent.
The Agent consists of a PromptNode that generates thoughts, chooses actions, and generates action inputs. Just like Haystack pipelines, an Agent can be loaded from a YAML file. That YAML file must also contain the tools of the Agent defined as pipelines or nodes. Tools need to be added to an Agent so that it can use them, just like nodes need to be added to pipelines. When a tool is added to an Agent, a description of the tools needs to be added so that the LLM knows when it is useful.
A key functionality of the Agent is that it can act iteratively and use any of the pre-defined tools as many times as it wants based on the input query and the results returned from the tools used earlier. In every iteration, it chooses one of the tools and generates the input for that tool dynamically. An example application of this is MultiHopQA, where multiple subquestions need to be answered step-by-step. For the example query "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?", the MRKLPipeline needs to answer several subquestions. Here is an example of a full transcript of the prompt input and generated output:
Answer the following questions as best as you can. You have access to the following tools:
Search: useful for when you need to answer questions about current events. You should ask targeted questions
Calculator: useful for when you need to answer questions about math
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search, Calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?
Thought: I need to do some research to answer this question.
Action: Search
Action Input: Olivia Wilde's boyfriend
Observation: First linked in November 2011, Wilde and Sudeikis got engaged in January 2013. They later became parents, welcoming son Otis in 2014 and daughter Daisy in 2016.
Thought: I need to find out his age
Action: Search
Action Input: Jason Sudeikis age
Observation: 47 years
Thought: I need to raise it to the 0.23 power
Action: Calculator
Action Input: 47^0.23
Observation: 2.4242784855673896
Thought: I now know the final answer
Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.
Agent steps breakdown
The above steps represent the entire action trace for the Agent. However, let's break it down into individual agent steps so we can understand how it makes decisions, chooses actions and action inputs.
Step 1:
We start with a prompt where we instruct LLM on what we want. The first prompt we send to LLM is the following:
Answer the following questions as best as you can. You have access to the following tools:
Search: useful for when you need to answer questions about current events. You should ask targeted questions
Calculator: useful for when you need to answer questions about math
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search, Calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final Answer
Final Answer: the final Answer to the original input question
Begin!
Question: Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?
Thought:
Notice how we finish the prompt with the Thought: token, priming the model to start its generation of an actual plan of what needs to be done in the first step.
LLM would also generate Action: and Action Input: rows of this step which help us select an Action to execute and the input for that action.
As we also instruct the model to stop generating a response with stop words being Observation: the model response for this step is:
I need to do some research to answer this question.
Action: Search
Action Input: Olivia Wilde's boyfriend
At this point, we invoke Search (along with the input) and receive the response from the Search tool: "First linked in November 2011, Wilde and Sudeikis got engaged in January 2013. They later became parents, welcoming son Otis in 2014 and daughter Daisy in 2016."
We append the tool response under the Observation:
LLM generation above and the response from the Search action (added under Observation) are appended to the initial prompt.
Step 2:
We start this step with the following prompt:
Answer the following questions as best as you can. You have access to the following tools:
Search: useful for when you need to answer questions about current events. You should ask targeted questions
Calculator: useful for when you need to answer questions about math
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search, Calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final Answer
Final Answer: the final Answer to the original input question
Begin!
Question: Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?
Thought: I need to do some research to answer this question.
Action: Search
Action Input: Olivia Wilde's boyfriend
Observation: First linked in November 2011, Wilde and Sudeikis got engaged in January 2013. They later became parents, welcoming son Otis in 2014 and daughter Daisy in 2016.
Thought:
Again, notice how we've added the response from LLM and the Observation from the tool to the prompt, and we finish the prompt with Thought: token, priming the model to start the response with the plan for this step. As in the previous step, the model generates an action plan and selects an action and its input. The LLM response is:
I need to find out his age
Action: Search
Action Input: Jason Sudeikis age
This LLM response above gives us enough information to invoke a Search tool again along with the appropriate input, and we receive the response from the Search: 47 years. We add this response to the prompt history as the Observation: keyword.
Step 3:
For the sake of brevity, let's not list the entire prompt again. The critical part to remember is that we append the output of step 2 to the prompt history we are creating as we step through each agent step. These so-called reasoning traces help agents "understand" what needs to be done in each successive step. The last part of the prompt is the following:
Thought: I need to find out his age
Action: Search
Action Input: Jason Sudeikis age
Observation: 47 years
Thought:
The LLM-generated response is:
I need to raise it to the 0.23 power
Action: Calculator
Action Input: 47^0.23
In this step, we invoke a new tool - The calculator with specified input. The calculator response is 2.4242784855673896 We added the calculator response to the prompt history under the Observation keyword.
Step 4:
Again, we append a calculator response and prompt to the prompt history once again. Let's not list the entire prompt, but the last few lines:
I need to raise it to the 0.23 power
Action: Calculator
Action Input: 47^0.23
Observation: 2.4242784855673896
Thought:
The LLM-generated response is:
I now know the final answer
Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.
Using simple string parsing, we can detect that the mode in this step responded with the "Final Answer:" keyword just as we instructed, thus breaking out of the loop and completing the agent's task with a response returned to the agent's client. In the rare case that "Final Answer:" is not generated even after many iterations, we can break out of the loop based on a maximum number of iterations allowed. Thereby, we can prevent an infinite loop.
Agent Creation
The Agent can be either created programmatically or loaded from a YAML file. In the following example, one tool is a node for searching the web. The other tool is a pipeline for doing calculations in python.
Example programmatic creation:
search = SerpAPIComponent(api_key=os.environ.get("SERPAPI_API_KEY"), name="Serp", inputs=["Query"])
prompt_model=PromptModel(model_name_or_path="text-davinci-003", api_key=os.environ.get("OPENAI_API_KEY"))
calculator = Pipeline()
calculator.add_node(PromptNode(
model_name_or_path=prompt_model,
default_prompt_template=PromptTemplate(prompt_text="Write a simple python function that calculates..."),
output_variable="python_runtime_input") # input
calculator.add_node(PythonRuntime()) # actual calculator
prompt_node = PromptNode(
model_name_or_path=prompt_model,
stop_words=["Observation:"]
)
agent = Agent(prompt_node=prompt_node)
# Nodes and pipelines can be added as tools to the agent. Just as nodes can be added to pipelines with add_node()
agent.add_tool("Search", search, "useful for when you need to answer questions about current events. You should ask targeted questions")
agent.add_tool("Calculator", calculator, "useful for when you need to answer questions about math")
result = agent.run("What is 2 to the power of 3?")
Example YAML file:
version: ignore
components:
- name: AgentPromptNode
type: PromptNode
params:
model_name_or_path: DavinciModel
stop_words: ['Observation:']
- name: DavinciModel
type: PromptModel
params:
model_name_or_path: 'text-davinci-003'
api_key: 'XYZ'
- name: Serp
type: SerpAPIComponent
params:
api_key: 'XYZ'
- name: CalculatorInput
type: PromptNode
params:
model_name_or_path: DavinciModel
default_prompt_template: CalculatorTemplate
output_variable: python_runtime_input
- name: Calculator
type: PythonRuntime
- name: CalculatorTemplate
type: PromptTemplate
params:
name: calculator
prompt_text: |
# Write a simple python function that calculates
# $query
# Do not print the result; invoke the function and assign the result to final_result variable
# Start with import statement
pipelines:
- name: calculator_pipeline
nodes:
- name: CalculatorInput
inputs: [Query]
- name: Calculator
inputs: [CalculatorInput]
agents:
- name: agent
params:
prompt_node: AgentPromptNode
tools:
- name: Search
pipeline_or_node: Serp
description: >
useful for when you need to answer questions about current events.
You should ask targeted questions
- name: Calculator
pipeline_or_node: calculator_pipeline
description: >
useful for when you need to answer questions about math
and loading from the YAML file into an Agent:
agent = Agent.load_from_yaml(
"test.mrkl.haystack-pipeline.yml", agent_name="agent"
)
Pipelines, agents, nodes, and tools all implement run and run_batch methods, which is the minimal contract.
At the moment, tools are either pipelines or nodes but we can imagine more types of tools as long as they implement that minimal contract.
Drawbacks
Although the scope of the initial Agent is limited, it can grow into a full-fledged framework consisting of various types of agents (conversation, Robotic Process Automation etc.). The field of agents is rapidly growing, and we should be aware that it can even outgrow Haystack in the future. Perhaps we can start with the Agent being part of Haystack and potentially create a new project in the future.
One of the central building blocks of an Agent are the PromptNode and set "neural attachments" extending the agent's capabilities. Many tools like Search, Calculator, Notion and API connectors are somewhat different conceptually from the existing Haystack components. On the other hand, some of the existing Haystack components fit naturally into the framework of tools, for example, DocumentStore, Retriever, and Reader.
There is a non-negligible potential for a growing implementation cost of such an agent framework that might stretch the resource away from the existing Haystack core. However, as LLM-based agents are an exciting and rapidly growing field, they may raise Haystack awareness significantly.
Alternatives
We have considered an alternative design where the Agent is just another node or a pipeline. However, we decided to introduce it as a separate concept because of user-friendliness / clear code. While a Pipeline is a collection of Nodes, an Agent is a collection of Pipelines. Nodes in a pipeline have a pre-defined execution order, whereas the execution order of Pipelines in an Agent are chosen at runtime by a LLM.
Regarding the name Agent, we considered several alternatives and prefer Agent for its simplicity. Alternative names:
- MRKLAgent
- LLMOrchestrator
- LLMChain
- Toolchain (fits nicely with tools and toolchains in software)
- PipelineComposer / LLMComposer
- PipelineComposition / LLMComposition
- Interesting naming tidbits:
Adoption strategy
Introducing the Agent concept is a rather big change that would require a careful adoption strategy. We would need a lot more documentation explaining these new concepts, and each attaching tool would need additional documentation.
However, the existing Haystack users, especially advanced users have already requested an agent framework to be added as part of the Haystack. We anticipate that advanced users will be the first to adopt the Agent.
Using an Agent requires an OpenAI api key and some tools require additional api keys, for example SerpAPI but there are free trials.
The debugging output of the Agent will help users to better understand how it works. In a debugger, the agent works as any other Haystack pipeline containing a prompt node.
How we teach this
Yes, adding agents to Haystack would require a lot of documentation changes. Perhaps even a separate documentation for MRKL and other future agents somewhat detached from Haystack.
We can teach existing Haystack users about agents and agent tools in a new section of the documentation. We can also organize Discord office hours, tutorials, and webinars to teach the new concepts.
Unresolved questions
Name of the parameter pipeline_or_node
- When we add a tool to the agent, we need to specify the name of the pipeline or node (component) to add.
This parameter could be called
pipeline_or_nodeorpipeline_or_component_nameetc.
Umbrella Term for Pipeline and Agent
- We need a term that captures pipelines and agents for communication with users (NLP application, flow, system, service, engine ...). Let's have that conversation separately from this proposal.
Tools we imagine in the near future
- Tools will be discussed in a separate proposal.