### What problem does this PR solve? This revision performed a comprehensive check on LightRAG to ensure the correctness of its implementation. It **did not involve** Entity Resolution and Community Reports Generation. There is an example using default entity types and the General chunking method, which shows good results in both time and effectiveness. Moreover, response caching is enabled for resuming failed tasks. [The-Necklace.pdf](https://github.com/user-attachments/files/22042432/The-Necklace.pdf) After:  ```bash Begin at: Fri, 29 Aug 2025 16:48:03 GMT Duration: 222.31 s Progress: 16:48:04 Task has been received. 16:48:06 Page(1~7): Start to parse. 16:48:06 Page(1~7): OCR started 16:48:08 Page(1~7): OCR finished (1.89s) 16:48:11 Page(1~7): Layout analysis (3.72s) 16:48:11 Page(1~7): Table analysis (0.00s) 16:48:11 Page(1~7): Text merged (0.00s) 16:48:11 Page(1~7): Finish parsing. 16:48:12 Page(1~7): Generate 7 chunks 16:48:12 Page(1~7): Embedding chunks (0.29s) 16:48:12 Page(1~7): Indexing done (0.04s). Task done (7.84s) 16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je... 16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half... 16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ... 16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin... 16:49:30 Completed processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je... after 1 gleanings, 21985 tokens. 16:49:30 Entities extraction of chunk 3 1/7 done, 12 nodes, 13 edges, 21985 tokens. 16:49:40 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Finally, she replied, hes... after 1 gleanings, 22584 tokens. 16:49:40 Entities extraction of chunk 5 2/7 done, 19 nodes, 19 edges, 22584 tokens. 16:50:02 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin... after 1 gleanings, 24610 tokens. 16:50:02 Entities extraction of chunk 0 3/7 done, 16 nodes, 28 edges, 24610 tokens. 16:50:03 Completed processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ... after 1 gleanings, 24031 tokens. 16:50:04 Entities extraction of chunk 1 4/7 done, 24 nodes, 22 edges, 24031 tokens. 16:50:14 Completed processing for f421fb06849e11f0bdd32724b93a52b2: So they begged the jewell... after 1 gleanings, 24635 tokens. 16:50:14 Entities extraction of chunk 6 5/7 done, 27 nodes, 26 edges, 24635 tokens. 16:50:29 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half... after 1 gleanings, 25758 tokens. 16:50:29 Entities extraction of chunk 2 6/7 done, 25 nodes, 35 edges, 25758 tokens. 16:51:35 Completed processing for f421fb06849e11f0bdd32724b93a52b2: The Necklace By Guy de Ma... after 1 gleanings, 27491 tokens. 16:51:35 Entities extraction of chunk 4 7/7 done, 39 nodes, 37 edges, 27491 tokens. 16:51:35 Entities and relationships extraction done, 147 nodes, 177 edges, 171094 tokens, 198.58s. 16:51:35 Entities merging done, 0.01s. 16:51:35 Relationships merging done, 0.01s. 16:51:35 ignored 7 relations due to missing entities. 16:51:35 generated subgraph for doc f421fb06849e11f0bdd32724b93a52b2 in 198.68 seconds. 16:51:35 run_graphrag f421fb06849e11f0bdd32724b93a52b2 graphrag_task_lock acquired 16:51:35 set_graph removed 0 nodes and 0 edges from index in 0.00s. 16:51:35 Get embedding of nodes: 9/147 16:51:35 Get embedding of nodes: 109/147 16:51:37 Get embedding of edges: 9/170 16:51:37 Get embedding of edges: 109/170 16:51:40 set_graph converted graph change to 319 chunks in 4.21s. 16:51:40 Insert chunks: 4/319 16:51:40 Insert chunks: 104/319 16:51:40 Insert chunks: 204/319 16:51:40 Insert chunks: 304/319 16:51:40 set_graph added/updated 147 nodes and 170 edges from index in 0.53s. 16:51:40 merging subgraph for doc f421fb06849e11f0bdd32724b93a52b2 into the global graph done in 4.79 seconds. 16:51:40 Knowledge Graph done (204.29s) ``` Before:  ```bash Begin at: Fri, 29 Aug 2025 17:00:47 GMT processDuration: 173.38 s Progress: 17:00:49 Task has been received. 17:00:51 Page(1~7): Start to parse. 17:00:51 Page(1~7): OCR started 17:00:53 Page(1~7): OCR finished (1.82s) 17:00:57 Page(1~7): Layout analysis (3.64s) 17:00:57 Page(1~7): Table analysis (0.00s) 17:00:57 Page(1~7): Text merged (0.00s) 17:00:57 Page(1~7): Finish parsing. 17:00:57 Page(1~7): Generate 7 chunks 17:00:57 Page(1~7): Embedding chunks (0.31s) 17:00:57 Page(1~7): Indexing done (0.03s). Task done (7.88s) 17:00:57 created task graphrag 17:01:00 Task has been received. 17:02:17 Entities extraction of chunk 1 1/7 done, 9 nodes, 9 edges, 10654 tokens. 17:02:31 Entities extraction of chunk 2 2/7 done, 12 nodes, 13 edges, 11066 tokens. 17:02:33 Entities extraction of chunk 4 3/7 done, 9 nodes, 10 edges, 10433 tokens. 17:02:42 Entities extraction of chunk 5 4/7 done, 11 nodes, 14 edges, 11290 tokens. 17:02:52 Entities extraction of chunk 6 5/7 done, 13 nodes, 15 edges, 11039 tokens. 17:02:55 Entities extraction of chunk 3 6/7 done, 14 nodes, 13 edges, 11466 tokens. 17:03:32 Entities extraction of chunk 0 7/7 done, 19 nodes, 18 edges, 13107 tokens. 17:03:32 Entities and relationships extraction done, 71 nodes, 89 edges, 79055 tokens, 149.66s. 17:03:32 Entities merging done, 0.01s. 17:03:32 Relationships merging done, 0.01s. 17:03:32 ignored 1 relations due to missing entities. 17:03:32 generated subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 in 149.69 seconds. 17:03:32 run_graphrag b1d9d3b6848711f0aacd7ddc0714c4d3 graphrag_task_lock acquired 17:03:32 set_graph removed 0 nodes and 0 edges from index in 0.00s. 17:03:32 Get embedding of nodes: 9/71 17:03:33 Get embedding of edges: 9/88 17:03:34 set_graph converted graph change to 161 chunks in 2.27s. 17:03:34 Insert chunks: 4/161 17:03:34 Insert chunks: 104/161 17:03:34 set_graph added/updated 71 nodes and 88 edges from index in 0.28s. 17:03:34 merging subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 into the global graph done in 2.60 seconds. 17:03:34 Knowledge Graph done (153.18s) ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring - [x] Performance Improvement
RAGFlow Sandbox
A secure, pluggable code execution backend for RAGFlow and beyond.
🔧 Features
- ✅ Seamless RAGFlow Integration — Out-of-the-box compatibility with the
codecomponent. - 🔐 High Security — Leverages gVisor for syscall-level sandboxing.
- 🔧 Customizable Sandboxing — Easily modify
seccompsettings as needed. - 🧩 Pluggable Runtime Support — Easily extend to support any programming language.
- ⚙️ Developer Friendly — Get started with a single command using
Makefile.
🏗 Architecture
🚀 Quick Start
📋 Prerequisites
Required
- Linux distro compatible with gVisor
- gVisor
- Docker >=
24.0.0 - Docker Compose >=
v2.26.1like RAGFlow - uv as package and project manager
Optional (Recommended)
- GNU Make for simplified CLI management
🐳 Build Docker Base Images
We use isolated base images for secure containerized execution:
# Build base images manually
docker build -t sandbox-base-python:latest ./sandbox_base_image/python
docker build -t sandbox-base-nodejs:latest ./sandbox_base_image/nodejs
# OR use Makefile
make build
Then, build the executor manager image:
docker build -t sandbox-executor-manager:latest ./executor_manager
📦 Running with RAGFlow
-
Ensure gVisor is correctly installed.
-
Configure your
.envindocker/.env:- Uncomment sandbox-related variables.
- Enable sandbox profile at the bottom.
-
Add the following line to
/etc/hostsas recommended:127.0.0.1 sandbox-executor-manager -
Start RAGFlow service.
🧭 Running Standalone
Manual Setup
-
Initialize environment:
cp .env.example .env -
Launch:
docker compose -f docker-compose.yml up -
Test:
source .venv/bin/activate export PYTHONPATH=$(pwd) uv pip install -r executor_manager/requirements.txt uv run tests/sandbox_security_tests_full.py
With Make
make # setup + build + launch + test
📈 Monitoring
docker logs -f sandbox-executor-manager # Manual
make logs # With Make
🧰 Makefile Toolbox
| Command | Description |
|---|---|
make |
Setup, build, launch and test all at once |
make setup |
Initialize environment and install uv |
make ensure_env |
Auto-create .env if missing |
make ensure_uv |
Install uv package manager if missing |
make build |
Build all Docker base images |
make start |
Start services with safe env loading and testing |
make stop |
Gracefully stop all services |
make restart |
Shortcut for stop + start |
make test |
Run full test suite |
make logs |
Stream container logs |
make clean |
Stop and remove orphan containers and volumes |
🔐 Security
The RAGFlow sandbox is designed to balance security and usability, offering solid protection without compromising developer experience.
✅ gVisor Isolation
At its core, we use gVisor, a user-space kernel, to isolate code execution from the host system. gVisor intercepts and restricts syscalls, offering robust protection against container escapes and privilege escalations.
🔒 Optional seccomp Support (Advanced)
For users who need zero-trust-level syscall control, we support an additional seccomp profile. This feature restricts containers to only a predefined set of system calls, as specified in executor_manager/seccomp-profile-default.json.
⚠️ This feature is disabled by default to maintain compatibility and usability. Enabling it may cause compatibility issues with some dependencies.
To enable seccomp
-
Edit your
.envfile:SANDBOX_ENABLE_SECCOMP=true -
Customize allowed syscalls in:
executor_manager/seccomp-profile-default.jsonThis profile is passed to the container with:
--security-opt seccomp=/app/seccomp-profile-default.json
🧠 Python Code AST Inspection
In addition to sandboxing, Python code is statically analyzed via AST (Abstract Syntax Tree) before execution. Potentially malicious code (e.g. file operations, subprocess calls, etc.) is rejected early, providing an extra layer of protection.
This security model strikes a balance between robust isolation and developer usability. While seccomp can be highly restrictive, our default setup aims to keep things usable for most developers — no obscure crashes or cryptic setup required.
📦 Add Extra Dependencies for Supported Languages
Currently, the following languages are officially supported:
| Language | Priority |
|---|---|
| Python | High |
| Node.js | Medium |
🐍 Python
To add Python dependencies, simply edit the following file:
sandbox_base_image/python/requirements.txt
Add any additional packages you need, one per line (just like a normal pip requirements file).
🟨 Node.js
To add Node.js dependencies:
-
Navigate to the Node.js base image directory:
cd sandbox_base_image/nodejs -
Use
npmto install the desired packages. For example:npm install lodash -
The dependencies will be saved to
package.jsonandpackage-lock.json, and included in the Docker image when rebuilt.
Usage
🐍 A Python example
def main(arg1: str, arg2: str) -> str:
return f"result: {arg1 + arg2}"
🟨 JavaScript examples
A simple sync function
function main({arg1, arg2}) {
return arg1+arg2
}
Async funcion with aioxs
const axios = require('axios');
async function main() {
try {
const response = await axios.get('https://github.com/infiniflow/ragflow');
return 'Body:' + response.data;
} catch (error) {
return 'Error:' + error.message;
}
}
📋 FAQ
❓Sandbox Not Working?
Follow this checklist to troubleshoot:
-
Is your machine compatible with gVisor?
Ensure that your system supports gVisor. Refer to the gVisor installation guide.
-
Is gVisor properly installed?
Common error:
HTTPConnectionPool(host='sandbox-executor-manager', port=9385): Read timed out.Cause:
runscis an unknown or invalid Docker runtime. Fix:-
Install gVisor
-
Restart Docker
-
Test with:
docker run --rm --runtime=runsc hello-world
-
-
Is
sandbox-executor-managermapped in/etc/hosts?Common error:
HTTPConnectionPool(host='none', port=9385): Max retries exceeded.Fix:
Add the following entry to
/etc/hosts:127.0.0.1 es01 infinity mysql minio redis sandbox-executor-manager -
Have you enabled sandbox-related configurations in RAGFlow?
Double-check that all sandbox settings are correctly enabled in your RAGFlow configuration.
-
Have you pulled the required base images for the runners?
Common error:
HTTPConnectionPool(host='sandbox-executor-manager', port=9385): Read timed out.Cause: no runner was started.
Fix:
Pull the necessary base images:
docker pull infiniflow/sandbox-base-nodejs:latest docker pull infiniflow/sandbox-base-python:latest -
Did you restart the service after making changes?
Any changes to configuration or environment require a full service restart to take effect.
❓Container pool is busy?
All available runners are currently in use, executing tasks/running code. Please try again shortly, or consider increasing the pool size in the configuration to improve availability and reduce wait times.
🤝 Contribution
Contributions are welcome!