ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2025-06-26 22:19:57 +00:00

Author	SHA1	Message	Date
Yongteng Lei	b705ff08fe	Refa: improve GraphRAG similarity sensitivity to numeric differences (#8479 ) ### What problem does this PR solve? Improve GraphRAG similarity sensitivity to numeric differences. #8444. ### Type of change - [x] Refactoring	2025-06-25 16:20:59 +08:00
Yongteng Lei	24ca4cc6b7	Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223 ) ### What problem does this PR solve? This PR investigates the cause of #7957. TL;DR: Incorrect similarity calculations lead to too many candidates. Since candidate selection involves interaction with the LLM, this causes significant delays in the program. What this PR does: 1. Fix similarity calculation: When processing a 64 pages government document, the corrected similarity calculation reduces the number of candidates from over 100,000 to around 16,000. With a default batch size of 100 pairs per LLM call, this fix reduces unnecessary LLM interactions from over 1,000 calls to around 160, a roughly 10x improvement. 2. Add concurrency and timeout limits: Up to 5 entity types are processed in "parallel", each with a 180-second timeout. These limits may be configurable in future updates. 3. Improve logging: The candidate resolution process now reports progress in real time. 4. Mitigates potential concurrency risks ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2025-06-12 19:09:50 +08:00
Stephen Hu	2337bbf6ca	Perf: pass useless check for tidy graph (#8121 ) ### What problem does this PR solve? Support passing the attribute check when the upstream has already made sure it. ### Type of change - [X] Performance Improvement	2025-06-09 11:44:13 +08:00
Stephen Hu	a71376ad6a	Fix: KeyError: 'method' when build run_graphrag (#7899 ) ### What problem does this PR solve? Close #7879 I checked the current master code, the kb_parser_config is join from knowledge table, so I think should be some edge cases due to history data ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-28 11:46:41 +08:00
alkscr	4ae8f87754	Fix: missing graph resolution and community extraction in graphrag tasks (#7586 ) ### What problem does this PR solve? Info of whether applying graph resolution and community extraction is storage in `task["kb_parser_config"]`. However, previous code get `graphrag_conf` from `task["parser_config"]`, making `with_resolution` and `with_community` are always false. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-05-13 09:21:03 +08:00
alkscr	ab27609a64	Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151 ) ### What problem does this PR solve? When you removed any document in a knowledge base using knowledge graph, the graph's `removed_kwd` is set to "Y". However, in the function `graphrag.utils.get_gaph`, `rebuild_graph` method is passed and directly return `None` while `removed_kwd=Y`, making residual part of the graph abandoned (but old entity data still exist in db). Besides, infinity instance actually pass deleting graph components' `source_id` when removing document. It may cause wrong graph after rebuild. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-30 09:43:17 +08:00
liuzhenghua	af770c5ced	perf: Optimize GraphRAG’s LOOP_PROMPT (#7356 ) ### What problem does this PR solve? 当前graphrag的LOOP_PROMPT，会导致模型输出Y之后，继续补充了实体和关系，比较浪费时间。参照[graph rag](https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py)最新的代码，修改了LOOP_PROMPT，经过验证，修改后可以稳定的输出Y停止。 Currently, GraphRAG’s LOOP_PROMPT causes the model to keep appending entities and relationships even after outputting “Y,” which wastes time. Referring to the latest code in [graphRAG](https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py), I modified the LOOP_PROMPT, and after verification the updated prompt reliably outputs “Y” and stops. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [x] Performance Improvement - [ ] Other (please describe): Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>	2025-04-28 13:31:04 +08:00
WhiteBear	2c62652ea8	<think> tag is missing. (#7256 ) ### What problem does this PR solve? Some models force thinking, resulting in the absence of the think tag in the returned content ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-24 11:44:10 +08:00
Kevin Hu	f2c9ffc056	Fix: KG search issue. (#7186 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-22 12:10:30 +08:00
aniaan	8b8a2f2949	fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106 ) ## Problem Description Multiple files in the RAGFlow project contain closure trap issues when using lambda functions with `trio.open_nursery()`. This problem causes concurrent tasks created in loops to reference the same variable, resulting in all tasks processing the same data (the data from the last iteration) rather than each task processing its corresponding data from the loop. ## Issue Details When using a `lambda` to create a closure function and passing it to `nursery.start_soon()` within a loop, the lambda function captures a reference to the loop variable rather than its value. For example: ```python # Problematic code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn)) ``` In this pattern, when concurrent tasks begin execution, `d` has already become the value after the loop ends (typically the last element), causing all tasks to use the same data. ## Fix Solution Changed the way concurrent tasks are created with `nursery.start_soon()` by leveraging Trio's API design to directly pass the function and its arguments separately: ```python # Fixed code async with trio.open_nursery() as nursery: for d in docs: nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn) ``` This way, each task uses the parameter values at the time of the function call, rather than references captured through closures. ## Fixed Files Fixed closure traps in the following files: 1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword extraction, question generation, and tag processing 2. `rag/raptor.py`: 1 fix, involving document summarization 3. `graphrag/utils.py`: 2 fixes, involving graph node and edge processing 4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution and graph node merging 5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document processing 6. `graphrag/general/extractor.py`: 3 fixes, involving content processing and graph node/edge merging 7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving community report extraction ## Potential Impact This fix resolves a serious concurrency issue that could have caused: - Data processing errors (processing duplicate data) - Performance degradation (all tasks working on the same data) - Inconsistent results (some data not being processed) After the fix, all concurrent tasks should correctly process their respective data, improving system correctness and reliability.	2025-04-18 18:00:20 +08:00
Stephen Hu	b1798bafb0	Fix: handle sometimes graph index will miss explanation (#7127 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/7053 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-18 14:24:36 +08:00
BUJIQI	627fd002ae	Update utils.py (#7091 ) ### What problem does this PR solve? when there are multiple entities, the variable `v` may be a list, which will lead to this error: ``` \| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all \| result = result.replace(f"{{{k}}}", v) \| TypeError: replace() argument 2 must be str, not list ``` this pr assign this `v` to be a str ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2025-04-17 17:17:09 +08:00
caiming100	d64c6870bb	Fix:When parsing documents with graph, an error occurred:[ERROR][Exception]: 'method' (#6836 ) [When parsing documents with graph, an error occurred:[ERROR][Exception]: 'method'] (https://github.com/infiniflow/ragflow/issues/6835) ### What problem does this PR solve? Close #6786 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: cm <caiming@sict.ac.cn>	2025-04-07 12:29:25 +08:00
Zhichang Yu	fdc410e743	Fix set_graph on non-existing edge (#6777 ) ### What problem does this PR solve? Fix set_graph on non-existing edge ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-04-03 11:09:04 +08:00
Zhichang Yu	e7a2a4b7ff	Log llm response on exception (#6750 ) ### What problem does this PR solve? Log llm response on exception ### Type of change - [x] Refactoring	2025-04-02 17:10:57 +08:00
Yue-Lyu123	20b8ccd1e9	Hotfix ece5903 (#6705 ) I'm really sorry, I found that in graphrag/general/extractor.py under def __call__, the line change.removed_nodes.extend(nodes[1:]) causes an AttributeError: 'set' object has no attribute 'extend'. Could you please merge the branch e666528 again? I made some modifications.	2025-04-01 12:06:28 +08:00
Yue-Lyu123	67330833af	fix: correct [AttributeError: 'set' object has no attribute 'nodes' T… (#6699 ) ### Related Issue: https://github.com/infiniflow/ragflow/issues/6653 ### Environment: Using nightly version [ece5903] Elasticsearch database Thanks for the review! My fault! I realize my initial testing wasn't passed. In graphrag/entity_resolution.py `sub_connect_graph` is a set like` {'HELLO', 'Hi', 'How are you'}`, Neither accessing `.nodes` nor `.nodes()` will work, it still causes `AttributeError: 'set' object has no attribute 'nodes'` In graphrag/general/extractor.py The `list.extend() `method performs an in-place operation, directly modifying the original list and returning ‘None’ rather than the modified list. Neither accessing `sorted(set(node0_attrs[attr].extend(node1_attrs.get(attr, []))))` nor `sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))` will work, it still causes `TypeError: 'NoneType' object is not iterable` ### Type of change - [ ] Bug Fix AttributeError: graphrag/entity_resolution.py - [ ] Bug Fix TypeError: graphrag/general/extractor.py	2025-04-01 09:38:21 +08:00
Yue-Lyu123	ece59034f7	fix: Resolve KnowledgeGraph entity resolution errors (#6653 ) (#6691 ) ### Related Issue: #6653 ### Environment: Using nightly version Elasticsearch database ### Bug Description: When clicking the "Entity Resolution" button in KnowledgeGraph, encountered the following errors: graphrag/entity_resolution.py ``` list(sub_connect_graph.nodes) AttributeError ``` graphrag/general/extractor.py ``` node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr]))) TypeError: 'NoneType' object is not iterable ``` ``` for attr in ["keywords", "source_id"]: KeyError I think attribute "keywords" is in edges not nodes ``` graphrag/utils.py ``` settings.docStoreConn.delete() # Sync function called as async ``` ### Changes Made: Fixed AttributeError in entity_resolution.py by properly handling graph nodes Fixed TypeError and KeyError in extractor.py by separate operations Corrected async/sync mismatch in document deletion call	2025-03-31 22:31:35 +08:00
Zhichang Yu	d32a35d8fd	Fix entity_types. Close #6287 and #6608 (#6632 ) ### What problem does this PR solve? Fix entity_types. Close #6287 and #6608 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-28 15:00:24 +08:00
Zhichang Yu	fe0396bbb9	Introduced delete_knowledge_graph (#6605 ) ### What problem does this PR solve? Introduced delete_knowledge_graph ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] Documentation Update	2025-03-27 17:16:48 +08:00
Zhichang Yu	36b62e0fab	EntityResolution batch. Close #6570 (#6602 ) ### What problem does this PR solve? EntityResolution batch ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-27 16:40:36 +08:00
Zhichang Yu	c4998d0e09	Rename graphrag task lock (#6576 ) ### What problem does this PR solve? Rename graphrag task lock ### Type of change - [x] Refactoring	2025-03-26 23:48:47 +08:00
Zhichang Yu	6bf26e2a81	Optimize graphrag again (#6513 ) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement	2025-03-26 15:34:42 +08:00
utopia2077	390086c6ab	Fix: split process bug in graphrag extract (#6423 ) ### What problem does this PR solve? 1. miss completion delimiter. 2. miss bracket process. 3. doc_ids return by update_graph is a set, and insert operation in extract_community need a list. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-24 21:41:20 +08:00
Kevin Hu	8b7e53e643	Fix: miss calculate of token number. (#6401 ) ### What problem does this PR solve? #6308 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 17:30:38 +08:00
Kevin Hu	9ed004e90d	Refa: control the simi for entity resolution. (#6386 ) ### What problem does this PR solve? #6352 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-21 13:16:34 +08:00
Kevin Hu	1333d3c02a	Fix: float transfer exception. (#6197 ) ### What problem does this PR solve? #6177 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-18 11:13:44 +08:00
Zhichang Yu	c00def5b71	Fix 6030 (#6070 ) ### What problem does this PR solve? Close #6030 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-14 11:29:22 +08:00
Zhichang Yu	e213873852	Optimize graphrag cache get entity (#6018 ) ### What problem does this PR solve? Optimize graphrag cache get entity ### Type of change - [x] Performance Improvement	2025-03-13 14:37:59 +08:00
Zhichang Yu	939e668096	Optimized graphrag again (#5927 ) ### What problem does this PR solve? Optimized graphrag again ### Type of change - [x] Performance Improvement	2025-03-11 18:36:10 +08:00
Zhichang Yu	6ec6ca6971	Refactor graphrag to remove redis lock (#5828 ) ### What problem does this PR solve? Refactor graphrag to remove redis lock ### Type of change - [x] Refactoring	2025-03-10 15:15:06 +08:00
Kevin Hu	1919780880	Refa: reduce default value of MAX_CONCURRENT_CHATS (#5821 ) ### What problem does this PR solve? #5786 ### Type of change - [x] Refactoring	2025-03-10 11:22:06 +08:00
Kevin Hu	06b29d7da4	Fix: empty description (#5747 ) ### What problem does this PR solve? #5705 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-07 11:23:37 +08:00
Kevin Hu	9fc7174612	Fix: too long context during KG issue. (#5723 ) ### What problem does this PR solve? #5088 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-06 19:21:07 +08:00
郭大鹏	78b2e0be89	fix: issue #5600 (#5645 ) fix: issue https://github.com/infiniflow/ragflow/issues/5600 ### What problem does this PR solve? close issue https://github.com/infiniflow/ragflow/issues/5600 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-05 16:50:37 +08:00
Zhichang Yu	f65c3ae62b	Refactored DocumentService.update_progress (#5642 ) ### What problem does this PR solve? Refactored DocumentService.update_progress ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-05 14:48:03 +08:00
Kevin Hu	02c955babb	Fix: parameter error. (#5641 ) ### What problem does this PR solve? #5600 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-05 14:37:51 +08:00
yihong	148a7e7002	fix: issue #5600 (#5620 ) ### What problem does this PR solve? close issue #5600 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-03-05 11:10:04 +08:00
Zhichang Yu	4d6484b03e	Fix nursery.start_soon. Close #5575 (#5591 ) ### What problem does this PR solve? Fix nursery.start_soon. Close #5575 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-04 14:46:54 +08:00
Zhichang Yu	c813c1ff4c	Made task_executor async to speedup parsing (#5530 ) ### What problem does this PR solve? Made task_executor async to speedup parsing ### Type of change - [x] Performance Improvement	2025-03-03 18:59:49 +08:00
Kevin Hu	1a41b92f77	More robust community report. (#5328 ) ### What problem does this PR solve? #5289 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-25 12:58:10 +08:00
Kevin Hu	ecf5f6976f	Make node merging parallel. (#5324 ) ### What problem does this PR solve? #5314 ### Type of change - [x] Performance Improvement	2025-02-25 12:02:44 +08:00
Kevin Hu	39b96849a9	Fix window size issue of ES. (#5175 ) ### What problem does this PR solve? #5152 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-20 12:54:29 +08:00
Kevin Hu	ef95f08c48	Remove redandent code. (#5121 ) ### What problem does this PR solve? #5107 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-19 15:46:52 +08:00
Kevin Hu	84b4b38cbb	Remove <think> for exeSql component. (#5069 ) ### What problem does this PR solve? #5061 #5067 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-18 13:39:37 +08:00
Kevin Hu	f46448d04c	Remove <think> for KG extraction. (#5027 ) ### What problem does this PR solve? #4946 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-17 14:06:06 +08:00
Kevin Hu	7c90b87715	Fix window size of ES issue. (#5026 ) ### What problem does this PR solve? #5015 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-17 12:48:56 +08:00
Kevin Hu	f29da49893	Fix keyerror issue while rebuilding graph. (#5022 ) ### What problem does this PR solve? #4995 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-17 12:02:44 +08:00
Kevin Hu	1287558f24	Fix xinference chat role order issue. (#4898 ) ### What problem does this PR solve? #4831 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-12 13:15:23 +08:00
Kevin Hu	0d3ed37b48	Make the update script shorter. (#4854 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-02-10 18:18:49 +08:00

1 2 3

114 Commits