114 Commits

Author SHA1 Message Date
Yongteng Lei
b705ff08fe
Refa: improve GraphRAG similarity sensitivity to numeric differences (#8479)
### What problem does this PR solve?

Improve GraphRAG similarity sensitivity to numeric differences. #8444.

### Type of change

- [x] Refactoring
2025-06-25 16:20:59 +08:00
Yongteng Lei
24ca4cc6b7
Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223)
### What problem does this PR solve?

This PR investigates the cause of #7957.

TL;DR: Incorrect similarity calculations lead to too many candidates.
Since candidate selection involves interaction with the LLM, this causes
significant delays in the program.

What this PR does:

1. **Fix similarity calculation**:
When processing a 64 pages government document, the corrected similarity
calculation reduces the number of candidates from over 100,000 to around
16,000. With a default batch size of 100 pairs per LLM call, this fix
reduces unnecessary LLM interactions from over 1,000 calls to around
160, a roughly 10x improvement.
2. **Add concurrency and timeout limits**: 
Up to 5 entity types are processed in "parallel", each with a 180-second
timeout. These limits may be configurable in future updates.
3. **Improve logging**:
The candidate resolution process now reports progress in real time.
4. **Mitigates potential concurrency risks**


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-12 19:09:50 +08:00
Stephen Hu
2337bbf6ca
Perf: pass useless check for tidy graph (#8121)
### What problem does this PR solve?
Support passing the attribute check when the upstream has already made
sure it.

### Type of change
- [X] Performance Improvement
2025-06-09 11:44:13 +08:00
Stephen Hu
a71376ad6a
Fix: KeyError: 'method' when build run_graphrag (#7899)
### What problem does this PR solve?
Close #7879
I checked the current master code, the kb_parser_config is join from
knowledge table, so I think should be some edge cases due to history
data

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-28 11:46:41 +08:00
alkscr
4ae8f87754
Fix: missing graph resolution and community extraction in graphrag tasks (#7586)
### What problem does this PR solve?

Info of whether applying graph resolution and community extraction is
storage in `task["kb_parser_config"]`. However, previous code get
`graphrag_conf` from `task["parser_config"]`, making `with_resolution`
and `with_community` are always false.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-05-13 09:21:03 +08:00
alkscr
ab27609a64
Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151)
### What problem does this PR solve?

When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).

Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 09:43:17 +08:00
liuzhenghua
af770c5ced
perf: Optimize GraphRAG’s LOOP_PROMPT (#7356)
### What problem does this PR solve?

当前graphrag的LOOP_PROMPT,会导致模型输出Y之后,继续补充了实体和关系,比较浪费时间。参照[graph
rag](https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py)最新的代码,修改了LOOP_PROMPT,经过验证,修改后可以稳定的输出Y停止。

Currently, GraphRAG’s LOOP_PROMPT causes the model to keep appending
entities and relationships even after outputting “Y,” which wastes time.
Referring to the latest code in
[graphRAG](https://github.com/microsoft/graphrag/blob/main/graphrag/prompts/index/extract_graph.py),
I modified the LOOP_PROMPT, and after verification the updated prompt
reliably outputs “Y” and stops.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-04-28 13:31:04 +08:00
WhiteBear
2c62652ea8
<think> tag is missing. (#7256)
### What problem does this PR solve?

Some models force thinking, resulting in the absence of the think tag in
the returned content

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-24 11:44:10 +08:00
Kevin Hu
f2c9ffc056
Fix: KG search issue. (#7186)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-22 12:10:30 +08:00
aniaan
8b8a2f2949
fix(nursery): Fix Closure Trap Issues in Trio Concurrent Tasks (#7106)
## Problem Description
Multiple files in the RAGFlow project contain closure trap issues when
using lambda functions with `trio.open_nursery()`. This problem causes
concurrent tasks created in loops to reference the same variable,
resulting in all tasks processing the same data (the data from the last
iteration) rather than each task processing its corresponding data from
the loop.

## Issue Details
When using a `lambda` to create a closure function and passing it to
`nursery.start_soon()` within a loop, the lambda function captures a
reference to the loop variable rather than its value. For example:

```python
# Problematic code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn))
```

In this pattern, when concurrent tasks begin execution, `d` has already
become the value after the loop ends (typically the last element),
causing all tasks to use the same data.

## Fix Solution
Changed the way concurrent tasks are created with `nursery.start_soon()`
by leveraging Trio's API design to directly pass the function and its
arguments separately:

```python
# Fixed code
async with trio.open_nursery() as nursery:
    for d in docs:
        nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn)
```

This way, each task uses the parameter values at the time of the
function call, rather than references captured through closures.

## Fixed Files
Fixed closure traps in the following files:

1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword
extraction, question generation, and tag processing
2. `rag/raptor.py`: 1 fix, involving document summarization
3. `graphrag/utils.py`: 2 fixes, involving graph node and edge
processing
4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution
and graph node merging
5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document
processing
6. `graphrag/general/extractor.py`: 3 fixes, involving content
processing and graph node/edge merging
7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving
community report extraction

## Potential Impact
This fix resolves a serious concurrency issue that could have caused:
- Data processing errors (processing duplicate data)
- Performance degradation (all tasks working on the same data)
- Inconsistent results (some data not being processed)

After the fix, all concurrent tasks should correctly process their
respective data, improving system correctness and reliability.
2025-04-18 18:00:20 +08:00
Stephen Hu
b1798bafb0
Fix: handle sometimes graph index will miss explanation (#7127)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7053

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-18 14:24:36 +08:00
BUJIQI
627fd002ae
Update utils.py (#7091)
### What problem does this PR solve?

when there are multiple entities, the variable `v` may be a list, which
will lead to this error:
```
| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all
| result = result.replace(f"{{{k}}}", v)
| TypeError: replace() argument 2 must be str, not list
```
this pr assign this `v` to be a str

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-04-17 17:17:09 +08:00
caiming100
d64c6870bb
Fix:When parsing documents with graph, an error occurred:[ERROR][Exception]: 'method' (#6836)
[When parsing documents with graph, an error
occurred:[ERROR][Exception]: 'method']
(https://github.com/infiniflow/ragflow/issues/6835)
### What problem does this PR solve?

Close #6786

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: cm <caiming@sict.ac.cn>
2025-04-07 12:29:25 +08:00
Zhichang Yu
fdc410e743
Fix set_graph on non-existing edge (#6777)
### What problem does this PR solve?

Fix set_graph on non-existing edge

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-03 11:09:04 +08:00
Zhichang Yu
e7a2a4b7ff
Log llm response on exception (#6750)
### What problem does this PR solve?

Log llm response on exception

### Type of change

- [x] Refactoring
2025-04-02 17:10:57 +08:00
Yue-Lyu123
20b8ccd1e9
Hotfix ece5903 (#6705)
I'm really sorry, I found that in graphrag/general/extractor.py under
def __call__, the line change.removed_nodes.extend(nodes[1:]) causes an
AttributeError: 'set' object has no attribute 'extend'. Could you please
merge the branch e666528 again? I made some modifications.
2025-04-01 12:06:28 +08:00
Yue-Lyu123
67330833af
fix: correct [AttributeError: 'set' object has no attribute 'nodes' T… (#6699)
### Related Issue: 
https://github.com/infiniflow/ragflow/issues/6653 

### Environment:
Using nightly version [ece5903]

Elasticsearch database

Thanks for the review! My fault! I realize my initial testing wasn't
passed.

In graphrag/entity_resolution.py 
 `sub_connect_graph` is a set like` {'HELLO', 'Hi', 'How are you'}`, 
Neither accessing `.nodes` nor `.nodes()` will work, **it still causes
`AttributeError: 'set' object has no attribute 'nodes'`**

In graphrag/general/extractor.py  
The `list.extend() `method performs an in-place operation, directly
modifying the original list and returning ‘None’ rather than the
modified list.
Neither accessing
`sorted(set(node0_attrs[attr].extend(node1_attrs.get(attr, []))))` nor
`sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))` will work,
**it still causes `TypeError: 'NoneType' object is not iterable`**
### Type of change

- [ ] Bug Fix AttributeError: graphrag/entity_resolution.py 
- [ ] Bug Fix TypeError: graphrag/general/extractor.py
2025-04-01 09:38:21 +08:00
Yue-Lyu123
ece59034f7
fix: Resolve KnowledgeGraph entity resolution errors (#6653) (#6691)
### Related Issue: #6653
### Environment:

Using nightly version

Elasticsearch database

### Bug Description:
When clicking the "Entity Resolution" button in KnowledgeGraph,
encountered the following errors:

graphrag/entity_resolution.py

```
list(sub_connect_graph.nodes) AttributeError
```

graphrag/general/extractor.py
```
node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))
TypeError: 'NoneType' object is not iterable
```
```
for attr in ["keywords", "source_id"]:  
 KeyError I think attribute "keywords" is in edges not nodes
```
graphrag/utils.py
```
settings.docStoreConn.delete()  # Sync function called as async
```
### Changes Made:

Fixed AttributeError in entity_resolution.py by properly handling graph
nodes

Fixed TypeError and KeyError in extractor.py by separate operations

Corrected async/sync mismatch in document deletion call
2025-03-31 22:31:35 +08:00
Zhichang Yu
d32a35d8fd
Fix entity_types. Close #6287 and #6608 (#6632)
### What problem does this PR solve?

Fix entity_types. Close #6287 and #6608

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-28 15:00:24 +08:00
Zhichang Yu
fe0396bbb9
Introduced delete_knowledge_graph (#6605)
### What problem does this PR solve?

Introduced delete_knowledge_graph

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] Documentation Update
2025-03-27 17:16:48 +08:00
Zhichang Yu
36b62e0fab
EntityResolution batch. Close #6570 (#6602)
### What problem does this PR solve?

EntityResolution batch

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-27 16:40:36 +08:00
Zhichang Yu
c4998d0e09
Rename graphrag task lock (#6576)
### What problem does this PR solve?

Rename graphrag task lock

### Type of change

- [x] Refactoring
2025-03-26 23:48:47 +08:00
Zhichang Yu
6bf26e2a81
Optimize graphrag again (#6513)
### What problem does this PR solve?

Removed set_entity and set_relation to avoid accessing doc engine during
graph computation.
Introduced GraphChange to avoid writing unchanged chunks.

### Type of change

- [x] Performance Improvement
2025-03-26 15:34:42 +08:00
utopia2077
390086c6ab
Fix: split process bug in graphrag extract (#6423)
### What problem does this PR solve?

1. miss completion delimiter.
2. miss bracket process.
3. doc_ids return by update_graph is a set, and insert operation in
extract_community need a list.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 21:41:20 +08:00
Kevin Hu
8b7e53e643
Fix: miss calculate of token number. (#6401)
### What problem does this PR solve?

#6308

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-21 17:30:38 +08:00
Kevin Hu
9ed004e90d
Refa: control the simi for entity resolution. (#6386)
### What problem does this PR solve?

#6352

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-21 13:16:34 +08:00
Kevin Hu
1333d3c02a
Fix: float transfer exception. (#6197)
### What problem does this PR solve?

#6177

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 11:13:44 +08:00
Zhichang Yu
c00def5b71
Fix 6030 (#6070)
### What problem does this PR solve?

Close #6030 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-14 11:29:22 +08:00
Zhichang Yu
e213873852
Optimize graphrag cache get entity (#6018)
### What problem does this PR solve?

Optimize graphrag cache get entity

### Type of change

- [x] Performance Improvement
2025-03-13 14:37:59 +08:00
Zhichang Yu
939e668096
Optimized graphrag again (#5927)
### What problem does this PR solve?

Optimized graphrag again

### Type of change

- [x] Performance Improvement
2025-03-11 18:36:10 +08:00
Zhichang Yu
6ec6ca6971
Refactor graphrag to remove redis lock (#5828)
### What problem does this PR solve?

Refactor graphrag to remove redis lock

### Type of change

- [x] Refactoring
2025-03-10 15:15:06 +08:00
Kevin Hu
1919780880 Refa: reduce default value of MAX_CONCURRENT_CHATS (#5821)
### What problem does this PR solve?

#5786

### Type of change

- [x] Refactoring
2025-03-10 11:22:06 +08:00
Kevin Hu
06b29d7da4
Fix: empty description (#5747)
### What problem does this PR solve?

#5705

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-07 11:23:37 +08:00
Kevin Hu
9fc7174612
Fix: too long context during KG issue. (#5723)
### What problem does this PR solve?

#5088

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-06 19:21:07 +08:00
郭大鹏
78b2e0be89
fix: issue #5600 (#5645)
fix: issue https://github.com/infiniflow/ragflow/issues/5600

### What problem does this PR solve?

close issue https://github.com/infiniflow/ragflow/issues/5600 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-05 16:50:37 +08:00
Zhichang Yu
f65c3ae62b
Refactored DocumentService.update_progress (#5642)
### What problem does this PR solve?

Refactored DocumentService.update_progress

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-05 14:48:03 +08:00
Kevin Hu
02c955babb
Fix: parameter error. (#5641)
### What problem does this PR solve?

#5600

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-05 14:37:51 +08:00
yihong
148a7e7002
fix: issue #5600 (#5620)
### What problem does this PR solve?

close issue #5600 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-03-05 11:10:04 +08:00
Zhichang Yu
4d6484b03e
Fix nursery.start_soon. Close #5575 (#5591)
### What problem does this PR solve?

Fix nursery.start_soon. Close #5575

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-04 14:46:54 +08:00
Zhichang Yu
c813c1ff4c
Made task_executor async to speedup parsing (#5530)
### What problem does this PR solve?

Made task_executor async to speedup parsing

### Type of change

- [x] Performance Improvement
2025-03-03 18:59:49 +08:00
Kevin Hu
1a41b92f77
More robust community report. (#5328)
### What problem does this PR solve?

#5289
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-25 12:58:10 +08:00
Kevin Hu
ecf5f6976f
Make node merging parallel. (#5324)
### What problem does this PR solve?

#5314

### Type of change

- [x] Performance Improvement
2025-02-25 12:02:44 +08:00
Kevin Hu
39b96849a9
Fix window size issue of ES. (#5175)
### What problem does this PR solve?

#5152

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-20 12:54:29 +08:00
Kevin Hu
ef95f08c48
Remove redandent code. (#5121)
### What problem does this PR solve?

#5107

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-19 15:46:52 +08:00
Kevin Hu
84b4b38cbb
Remove <think> for exeSql component. (#5069)
### What problem does this PR solve?

#5061
#5067

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-18 13:39:37 +08:00
Kevin Hu
f46448d04c
Remove <think> for KG extraction. (#5027)
### What problem does this PR solve?

#4946

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-17 14:06:06 +08:00
Kevin Hu
7c90b87715
Fix window size of ES issue. (#5026)
### What problem does this PR solve?

#5015

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-17 12:48:56 +08:00
Kevin Hu
f29da49893
Fix keyerror issue while rebuilding graph. (#5022)
### What problem does this PR solve?

#4995

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-17 12:02:44 +08:00
Kevin Hu
1287558f24
Fix xinference chat role order issue. (#4898)
### What problem does this PR solve?

#4831

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-12 13:15:23 +08:00
Kevin Hu
0d3ed37b48
Make the update script shorter. (#4854)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-10 18:18:49 +08:00