mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-12-30 16:47:31 +00:00
Compare commits
No commits in common. "main" and "nightly" have entirely different histories.
@ -1,8 +0,0 @@
|
||||
{
|
||||
"label": "Basics",
|
||||
"position": 2,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Basic concepts."
|
||||
}
|
||||
}
|
||||
@ -1,61 +0,0 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
slug: /what_is_agent_context_engine
|
||||
---
|
||||
|
||||
# What is Agent context engine?
|
||||
|
||||
From 2025, a silent revolution began beneath the dazzling surface of AI Agents. While the world marveled at agents that could write code, analyze data, and automate workflows, a fundamental bottleneck emerged: why do even the most advanced agents still stumble on simple questions, forget previous conversations, or misuse available tools?
|
||||
|
||||
The answer lies not in the intelligence of the Large Language Model (LLM) itself, but in the quality of the Context it receives. An LLM, no matter how powerful, is only as good as the information we feed it. Today’s cutting-edge agents are often crippled by a cumbersome, manual, and error-prone process of context assembly—a process known as Context Engineering.
|
||||
|
||||
This is where the Agent Context Engine comes in. It is not merely an incremental improvement but a foundational shift, representing the evolution of RAG from a singular technique into the core data and intelligence substrate for the entire Agent ecosystem.
|
||||
|
||||
## Beyond the hype: The reality of today's "intelligent" Agents
|
||||
Today, the “intelligence” behind most AI Agents hides a mountain of human labor. Developers must:
|
||||
|
||||
- Hand-craft elaborate prompt templates
|
||||
- Hard-code document-retrieval logic for every task
|
||||
- Juggle tool descriptions, conversation history, and knowledge snippets inside a tiny context window
|
||||
- Repeat the whole process for each new scenario
|
||||
|
||||
This pattern is called Context Engineering. It is deeply tied to expert know-how, almost impossible to scale, and prohibitively expensive to maintain. When an enterprise needs to keep dozens of distinct agents alive, the artisanal workshop model collapses under its own weight.
|
||||
|
||||
The mission of an Agent Context Engine is to turn Context Engineering from an “art” into an industrial-grade science.
|
||||
|
||||
Deconstructing the Agent Context Engine
|
||||
So, what exactly is an Agent Context Engine? It is a unified, intelligent, and automated platform responsible for the end-to-end process of assembling the optimal context for an LLM or Agent at the moment of inference. It moves from artisanal crafting to industrialized production.
|
||||
At its core, an Agent Context Engine is built on a triumvirate of next-generation retrieval capabilities, seamlessly integrated into a single service layer:
|
||||
|
||||
1. The Knowledge Core (Advanced RAG): This is the evolution of traditional RAG. It moves beyond simple chunk-and-embed to intelligently process static, private enterprise knowledge. Techniques like TreeRAG (building LLM-generated document outlines for "locate-then-expand" retrieval) and GraphRAG (extracting entity networks to find semantically distant connections) work to close the "semantic gap." The engine’s Ingestion Pipeline acts as the ETL for unstructured data, parsing multi-format documents and using LLMs to enrich content with summaries, metadata, and structure before indexing.
|
||||
|
||||
2. The Memory Layer: An Agent’s intelligence is defined by its ability to learn from interaction. The Memory Layer is a specialized retrieval system for dynamic, episodic data: conversation history, user preferences, and the agent’s own internal state (e.g., "waiting for human input"). It manages the lifecycle of this data—storing raw dialogue, triggering summarization into semantic memory, and retrieving relevant past interactions to provide continuity and personalization. Technologically, it is a close sibling to RAG, but focused on a temporal stream of data.
|
||||
|
||||
3. The Tool Orchestrator: As MCP (Model Context Protocol) enables the connection of hundreds of internal services as tools, a new problem arises: tool selection. The Context Engine solves this with Tool Retrieval. Instead of dumping all tool descriptions into the prompt, it maintains an index of tools and—critically—an index of Playbooks or Guidelines (best practices on when and how to use tools). For a given task, it retrieves only the most relevant tools and instructions, transforming the LLM’s job from "searching a haystack" to "following a recipe."
|
||||
|
||||
## Why we need a dedicated engine? The case for a unified substrate
|
||||
|
||||
The necessity of an Agent Context Engine becomes clear when we examine the alternative: siloed, manually wired components.
|
||||
|
||||
- The Data Silo Problem: Knowledge, memory, and tools reside in separate systems, requiring complex integration for each new agent.
|
||||
- The Assembly Line Bottleneck: Developers spend more time on context plumbing than on agent logic, slowing innovation to a crawl.
|
||||
- The "Context Ownership" Dilemma: In manually engineered systems, context logic is buried in code, owned by developers, and opaque to business users. An Engine makes context a configurable, observable, and customer-owned asset.
|
||||
|
||||
The shift from Context Engineering to a Context Platform/Engine marks the maturation of enterprise AI, as summarized in the table below:
|
||||
|
||||
| Dimension | Context engineering (present) | Context engineering/Platform (future) |
|
||||
| ------------------- | -------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
|
||||
| Context creation | Manual, artisanal work by developers and prompt engineers. | Automated, driven by intelligent ingestion pipelines and configurable rules. |
|
||||
| Context delivery | Hard-coded prompts and static retrieval logic embedded in agent workflows. | Dynamic, real-time retrieval and assembly based on the agent's live state and intent. |
|
||||
| Context maintenance | A development and operational burden, logic locked in code. | A manageable platform function, with visibility and control returned to the business. |
|
||||
|
||||
|
||||
## RAGFlow: A resolute march toward the context engine of Agents
|
||||
|
||||
This is the future RAGFlow is forging.
|
||||
|
||||
We left behind the label of “yet another RAG system” long ago. From DeepDoc—our deeply-optimized, multimodal document parser—to the bleeding-edge architectures that bridge semantic chasms in complex RAG scenarios, all the way to a full-blown, enterprise-grade ingestion pipeline, every evolutionary step RAGFlow takes is a deliberate stride toward the ultimate form: an Agentic Context Engine.
|
||||
|
||||
We believe tomorrow’s enterprise AI advantage will hinge not on who owns the largest model, but on who can feed that model the highest-quality, most real-time, and most relevant context. An Agentic Context Engine is the critical infrastructure that turns this vision into reality.
|
||||
|
||||
In the paradigm shift from “hand-crafted prompts” to “intelligent context,” RAGFlow is determined to be the most steadfast propeller and enabler. We invite every developer, enterprise, and researcher who cares about the future of AI agents to follow RAGFlow’s journey—so together we can witness and build the cornerstone of the next-generation AI stack.
|
||||
@ -1,107 +0,0 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
slug: /what_is_rag
|
||||
---
|
||||
|
||||
# What is Retreival-Augmented-Generation (RAG)?
|
||||
|
||||
Since large language models (LLMs) became the focus of technology, their ability to handle general knowledge has been astonishing. However, when questions shift to internal corporate documents, proprietary knowledge bases, or real-time data, the limitations of LLMs become glaringly apparent: they cannot access private information outside their training data. Retrieval-Augmented Generation (RAG) was born precisely to address this core need. Before an LLM generates an answer, it first retrieves the most relevant context from an external knowledge base and inputs it as "reference material" to the LLM, thereby guiding it to produce accurate answers. In short, RAG elevates LLMs from "relying on memory" to "having evidence to rely on," significantly improving their accuracy and trustworthiness in specialized fields and real-time information queries.
|
||||
|
||||
## Why RAG is important?
|
||||
|
||||
Although LLMs excel in language understanding and generation, they have inherent limitations:
|
||||
|
||||
- Static Knowledge: The model's knowledge is based on a data snapshot from its training time and cannot be automatically updated, making it difficult to perceive the latest information.
|
||||
- Blind Spot to External Data: They cannot directly access corporate private documents, real-time information streams, or domain-specific content.
|
||||
- Hallucination Risk: When lacking accurate evidence, they may still fabricate plausible-sounding but false answers to maintain conversational fluency.
|
||||
|
||||
The introduction of RAG provides LLMs with real-time, credible "factual grounding." Its core mechanism is divided into two stages:
|
||||
|
||||
- Retrieval Stage: Based on the user's question, quickly retrieve the most relevant documents or data fragments from an external knowledge base.
|
||||
- Generation Stage: The LLM organizes and generates the final answer by incorporating the retrieved information as context, combined with its own linguistic capabilities.
|
||||
|
||||
This upgrades LLMs from "speaking from memory" to "speaking with documentation," significantly enhancing reliability in professional and enterprise-level applications.
|
||||
|
||||
## How RAG works?
|
||||
|
||||
Retrieval-Augmented Generation enables LLMs to generate higher-quality responses by leveraging real-time, external, or private data sources through the introduction of an information retrieval mechanism. Its workflow can be divided into following key steps:
|
||||
|
||||
### Data processing and vectorization
|
||||
|
||||
The knowledge required by RAG comes from unstructured data in various formats, such as documents, database records, or API return content. This data typically needs to be chunked, then transformed into vectors via an embedding model, and stored in a vector database.
|
||||
|
||||
Why is Chunking Needed? Indexing entire documents directly faces the following problems:
|
||||
|
||||
- Decreased Retrieval Precision: Vectorizing long documents leads to semantic "averaging," losing details.
|
||||
- Context Length Limitation: LLMs have a finite context window, requiring filtering of the most relevant parts for input.
|
||||
- Cost and Efficiency: Embedding computation and retrieval costs are higher for long texts.
|
||||
|
||||
Therefore, an intelligent chunking strategy is key to balancing information integrity, retrieval granularity, and computational efficiency.
|
||||
|
||||
### Retrieve relevant information
|
||||
|
||||
The user's query is also converted into a vector to perform semantic relevance searches (e.g., calculating cosine similarity) in the vector database, matching and recalling the most relevant text fragments.
|
||||
|
||||
### Context construction and answer generation
|
||||
|
||||
The retrieved relevant content is added to the LLM's context as factual grounding, and the LLM finally generates the answer. Therefore, RAG can be seen as Context Engineering 1.0 for automated context construction.
|
||||
|
||||
## Deep dive into existing RAG architecture: beyond vector retrieval
|
||||
|
||||
An industrial-grade RAG system is far from being as simple as "vector search + LLM"; its complexity and challenges are primarily embedded in the retrieval process.
|
||||
|
||||
### Data complexity: multimodal document processing
|
||||
|
||||
Core Challenge: Corporate knowledge mostly exists in the form of multimodal documents containing text, charts, tables, and formulas. Simple OCR extraction loses a large amount of semantic information.
|
||||
|
||||
Advanced Practice: Leading solutions, such as RAGFlow, tend to use Visual Language Models (VLM) or specialized parsing models like DeepDoc to "translate" multimodal documents into unimodal text rich in structural and semantic information. Converting multimodal information into high-quality unimodal text has become standard practice for advanced RAG.
|
||||
|
||||
### The complexity of chunking: the trade-off between precision and context
|
||||
|
||||
A simple "chunk-embed-retrieve" pipeline has an inherent contradiction:
|
||||
- Semantic Matching requires small text chunks to ensure clear semantic focus.
|
||||
- Context Understanding requires large text chunks to ensure complete and coherent information.
|
||||
|
||||
This forces system design into a difficult trade-off between "precise but fragmented" and "complete but vague."
|
||||
|
||||
Advanced Practice: Leading solutions, such as RAGFlow, employ semantic enhancement techniques like constructing semantic tables of contents and knowledge graphs. These not only address semantic fragmentation caused by physical chunking but also enable the discovery of relevant content across documents based on entity-relationship networks.
|
||||
|
||||
### Why is a vector database insufficient for serving RAG?
|
||||
|
||||
Vector databases excel at semantic similarity search, but RAG requires precise and reliable answers, demanding more capabilities from the retrieval system:
|
||||
- Hybrid Search: Relying solely on vector retrieval may miss exact keyword matches (e.g., product codes, regulation numbers). Hybrid search, combining vector retrieval with keyword retrieval (BM25), ensures both semantic breadth and keyword precision.
|
||||
- Tensor or Multi-Vector Representation: To support cross-modal data, employing tensor or multi-vector representation has become an important trend.
|
||||
- Metadata Filtering: Filtering based on attributes like date, department, and type is a rigid requirement in business scenarios.
|
||||
|
||||
Therefore, the retrieval layer of RAG is a composite system based on vector search but must integrate capabilities like full-text search, re-ranking, and metadata filtering.
|
||||
|
||||
## RAG and memory: Retrieval from the same source but different streams
|
||||
|
||||
Within the agent framework, the essence of the memory mechanism is the same as RAG: both retrieve relevant information from storage based on current needs. The key difference lies in the data source:
|
||||
- RAG: Targets pre-existing static or dynamic private data provided by the user in advance (e.g., documents, databases).
|
||||
- Memory: Targets dynamic data generated or perceived by the agent in real-time during interaction (e.g., conversation history, environmental state, tool execution results).
|
||||
They are highly consistent at the technical base (e.g., vector retrieval, keyword matching) and can be seen as the same retrieval capability applied in different scenarios ("existing knowledge" vs. "interaction memory"). A complete agent system often includes both an RAG module for inherent knowledge and a Memory module for interaction history.
|
||||
|
||||
## RAG applications
|
||||
|
||||
RAG has demonstrated clear value in several typical scenarios:
|
||||
|
||||
1. Enterprise Knowledge Q&A and Internal Search
|
||||
By vectorizing corporate private data and combining it with an LLM, RAG can directly return natural language answers based on authoritative sources, rather than document lists. While meeting intelligent Q&A needs, it inherently aligns with corporate requirements for data security, access control, and compliance.
|
||||
2. Complex Document Understanding and Professional Q&A
|
||||
For structurally complex documents like contracts and regulations, the value of RAG lies in its ability to generate accurate, verifiable answers while maintaining context integrity. Its system accuracy largely depends on text chunking and semantic understanding strategies.
|
||||
3. Dynamic Knowledge Fusion and Decision Support
|
||||
In business scenarios requiring the synthesis of information from multiple sources, RAG evolves into a knowledge orchestration and reasoning support system for business decisions. Through a multi-path recall mechanism, it fuses knowledge from different systems and formats, maintaining factual consistency and logical controllability during the generation phase.
|
||||
|
||||
## The future of RAG
|
||||
|
||||
The evolution of RAG is unfolding along several clear paths:
|
||||
|
||||
1. RAG as the data foundation for Agents
|
||||
RAG and agents have an architecture vs. scenario relationship. For agents to achieve autonomous and reliable decision-making and execution, they must rely on accurate and timely knowledge. RAG provides them with a standardized capability to access private domain knowledge and is an inevitable choice for building knowledge-aware agents.
|
||||
2. Advanced RAG: Using LLMs to optimize retrieval itself
|
||||
The core feature of next-generation RAG is fully utilizing the reasoning capabilities of LLMs to optimize the retrieval process, such as rewriting queries, summarizing or fusing results, or implementing intelligent routing. Empowering every aspect of retrieval with LLMs is key to breaking through current performance bottlenecks.
|
||||
3. Towards context engineering 2.0
|
||||
Current RAG can be viewed as Context Engineering 1.0, whose core is assembling static knowledge context for single Q&A tasks. The forthcoming Context Engineering 2.0 will extend with RAG technology at its core, becoming a system that automatically and dynamically assembles comprehensive context for agents. The context fused by this system will come not only from documents but also include interaction memory, available tools/skills, and real-time environmental information. This marks the transition of agent development from a "handicraft workshop" model to the industrial starting point of automated context engineering.
|
||||
|
||||
The essence of RAG is to build a dedicated, efficient, and trustworthy external data interface for large language models; its core is Retrieval, not Generation. Starting from the practical need to solve private data access, its technical depth is reflected in the optimization of retrieval for complex unstructured data. With its deep integration into agent architectures and its development towards automated context engineering, RAG is evolving from a technology that improves Q&A quality into the core infrastructure for building the next generation of trustworthy, controllable, and scalable intelligent applications.
|
||||
Loading…
x
Reference in New Issue
Block a user