### What problem does this PR solve?
### Type of change
- [x] Refactoring
- [x] Performance Improvement
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
### What problem does this PR solve?
This revision performed a comprehensive check on LightRAG to ensure the
correctness of its implementation. It **did not involve** Entity
Resolution and Community Reports Generation. There is an example using
default entity types and the General chunking method, which shows good
results in both time and effectiveness. Moreover, response caching is
enabled for resuming failed tasks.
[The-Necklace.pdf](https://github.com/user-attachments/files/22042432/The-Necklace.pdf)
After:

```bash
Begin at:
Fri, 29 Aug 2025 16:48:03 GMT
Duration:
222.31 s
Progress:
16:48:04 Task has been received.
16:48:06 Page(1~7): Start to parse.
16:48:06 Page(1~7): OCR started
16:48:08 Page(1~7): OCR finished (1.89s)
16:48:11 Page(1~7): Layout analysis (3.72s)
16:48:11 Page(1~7): Table analysis (0.00s)
16:48:11 Page(1~7): Text merged (0.00s)
16:48:11 Page(1~7): Finish parsing.
16:48:12 Page(1~7): Generate 7 chunks
16:48:12 Page(1~7): Embedding chunks (0.29s)
16:48:12 Page(1~7): Indexing done (0.04s). Task done (7.84s)
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ...
16:48:17 Start processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin...
16:49:30 Completed processing for f421fb06849e11f0bdd32724b93a52b2: She had no dresses, no je... after 1 gleanings, 21985 tokens.
16:49:30 Entities extraction of chunk 3 1/7 done, 12 nodes, 13 edges, 21985 tokens.
16:49:40 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Finally, she replied, hes... after 1 gleanings, 22584 tokens.
16:49:40 Entities extraction of chunk 5 2/7 done, 19 nodes, 19 edges, 22584 tokens.
16:50:02 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Then she asked, hesitatin... after 1 gleanings, 24610 tokens.
16:50:02 Entities extraction of chunk 0 3/7 done, 16 nodes, 28 edges, 24610 tokens.
16:50:03 Completed processing for f421fb06849e11f0bdd32724b93a52b2: And this life lasted ten ... after 1 gleanings, 24031 tokens.
16:50:04 Entities extraction of chunk 1 4/7 done, 24 nodes, 22 edges, 24031 tokens.
16:50:14 Completed processing for f421fb06849e11f0bdd32724b93a52b2: So they begged the jewell... after 1 gleanings, 24635 tokens.
16:50:14 Entities extraction of chunk 6 5/7 done, 27 nodes, 26 edges, 24635 tokens.
16:50:29 Completed processing for f421fb06849e11f0bdd32724b93a52b2: Her husband, already half... after 1 gleanings, 25758 tokens.
16:50:29 Entities extraction of chunk 2 6/7 done, 25 nodes, 35 edges, 25758 tokens.
16:51:35 Completed processing for f421fb06849e11f0bdd32724b93a52b2: The Necklace By Guy de Ma... after 1 gleanings, 27491 tokens.
16:51:35 Entities extraction of chunk 4 7/7 done, 39 nodes, 37 edges, 27491 tokens.
16:51:35 Entities and relationships extraction done, 147 nodes, 177 edges, 171094 tokens, 198.58s.
16:51:35 Entities merging done, 0.01s.
16:51:35 Relationships merging done, 0.01s.
16:51:35 ignored 7 relations due to missing entities.
16:51:35 generated subgraph for doc f421fb06849e11f0bdd32724b93a52b2 in 198.68 seconds.
16:51:35 run_graphrag f421fb06849e11f0bdd32724b93a52b2 graphrag_task_lock acquired
16:51:35 set_graph removed 0 nodes and 0 edges from index in 0.00s.
16:51:35 Get embedding of nodes: 9/147
16:51:35 Get embedding of nodes: 109/147
16:51:37 Get embedding of edges: 9/170
16:51:37 Get embedding of edges: 109/170
16:51:40 set_graph converted graph change to 319 chunks in 4.21s.
16:51:40 Insert chunks: 4/319
16:51:40 Insert chunks: 104/319
16:51:40 Insert chunks: 204/319
16:51:40 Insert chunks: 304/319
16:51:40 set_graph added/updated 147 nodes and 170 edges from index in 0.53s.
16:51:40 merging subgraph for doc f421fb06849e11f0bdd32724b93a52b2 into the global graph done in 4.79 seconds.
16:51:40 Knowledge Graph done (204.29s)
```
Before:

```bash
Begin at:
Fri, 29 Aug 2025 17:00:47 GMT
processDuration:
173.38 s
Progress:
17:00:49 Task has been received.
17:00:51 Page(1~7): Start to parse.
17:00:51 Page(1~7): OCR started
17:00:53 Page(1~7): OCR finished (1.82s)
17:00:57 Page(1~7): Layout analysis (3.64s)
17:00:57 Page(1~7): Table analysis (0.00s)
17:00:57 Page(1~7): Text merged (0.00s)
17:00:57 Page(1~7): Finish parsing.
17:00:57 Page(1~7): Generate 7 chunks
17:00:57 Page(1~7): Embedding chunks (0.31s)
17:00:57 Page(1~7): Indexing done (0.03s). Task done (7.88s)
17:00:57 created task graphrag
17:01:00 Task has been received.
17:02:17 Entities extraction of chunk 1 1/7 done, 9 nodes, 9 edges, 10654 tokens.
17:02:31 Entities extraction of chunk 2 2/7 done, 12 nodes, 13 edges, 11066 tokens.
17:02:33 Entities extraction of chunk 4 3/7 done, 9 nodes, 10 edges, 10433 tokens.
17:02:42 Entities extraction of chunk 5 4/7 done, 11 nodes, 14 edges, 11290 tokens.
17:02:52 Entities extraction of chunk 6 5/7 done, 13 nodes, 15 edges, 11039 tokens.
17:02:55 Entities extraction of chunk 3 6/7 done, 14 nodes, 13 edges, 11466 tokens.
17:03:32 Entities extraction of chunk 0 7/7 done, 19 nodes, 18 edges, 13107 tokens.
17:03:32 Entities and relationships extraction done, 71 nodes, 89 edges, 79055 tokens, 149.66s.
17:03:32 Entities merging done, 0.01s.
17:03:32 Relationships merging done, 0.01s.
17:03:32 ignored 1 relations due to missing entities.
17:03:32 generated subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 in 149.69 seconds.
17:03:32 run_graphrag b1d9d3b6848711f0aacd7ddc0714c4d3 graphrag_task_lock acquired
17:03:32 set_graph removed 0 nodes and 0 edges from index in 0.00s.
17:03:32 Get embedding of nodes: 9/71
17:03:33 Get embedding of edges: 9/88
17:03:34 set_graph converted graph change to 161 chunks in 2.27s.
17:03:34 Insert chunks: 4/161
17:03:34 Insert chunks: 104/161
17:03:34 set_graph added/updated 71 nodes and 88 edges from index in 0.28s.
17:03:34 merging subgraph for doc b1d9d3b6848711f0aacd7ddc0714c4d3 into the global graph done in 2.60 seconds.
17:03:34 Knowledge Graph done (153.18s)
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Support passing the attribute check when the upstream has already made
sure it.
### Type of change
- [X] Performance Improvement
### What problem does this PR solve?
When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).
Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem Description
Multiple files in the RAGFlow project contain closure trap issues when
using lambda functions with `trio.open_nursery()`. This problem causes
concurrent tasks created in loops to reference the same variable,
resulting in all tasks processing the same data (the data from the last
iteration) rather than each task processing its corresponding data from
the loop.
## Issue Details
When using a `lambda` to create a closure function and passing it to
`nursery.start_soon()` within a loop, the lambda function captures a
reference to the loop variable rather than its value. For example:
```python
# Problematic code
async with trio.open_nursery() as nursery:
for d in docs:
nursery.start_soon(lambda: doc_keyword_extraction(chat_mdl, d, topn))
```
In this pattern, when concurrent tasks begin execution, `d` has already
become the value after the loop ends (typically the last element),
causing all tasks to use the same data.
## Fix Solution
Changed the way concurrent tasks are created with `nursery.start_soon()`
by leveraging Trio's API design to directly pass the function and its
arguments separately:
```python
# Fixed code
async with trio.open_nursery() as nursery:
for d in docs:
nursery.start_soon(doc_keyword_extraction, chat_mdl, d, topn)
```
This way, each task uses the parameter values at the time of the
function call, rather than references captured through closures.
## Fixed Files
Fixed closure traps in the following files:
1. `rag/svr/task_executor.py`: 3 fixes, involving document keyword
extraction, question generation, and tag processing
2. `rag/raptor.py`: 1 fix, involving document summarization
3. `graphrag/utils.py`: 2 fixes, involving graph node and edge
processing
4. `graphrag/entity_resolution.py`: 2 fixes, involving entity resolution
and graph node merging
5. `graphrag/general/mind_map_extractor.py`: 2 fixes, involving document
processing
6. `graphrag/general/extractor.py`: 3 fixes, involving content
processing and graph node/edge merging
7. `graphrag/general/community_reports_extractor.py`: 1 fix, involving
community report extraction
## Potential Impact
This fix resolves a serious concurrency issue that could have caused:
- Data processing errors (processing duplicate data)
- Performance degradation (all tasks working on the same data)
- Inconsistent results (some data not being processed)
After the fix, all concurrent tasks should correctly process their
respective data, improving system correctness and reliability.
### What problem does this PR solve?
when there are multiple entities, the variable `v` may be a list, which
will lead to this error:
```
| File "/mnt/d/wrf/ragflow/ragflow/graphrag/utils.py", line 59, in replace_all
| result = result.replace(f"{{{k}}}", v)
| TypeError: replace() argument 2 must be str, not list
```
this pr assign this `v` to be a str
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Related Issue: #6653
### Environment:
Using nightly version
Elasticsearch database
### Bug Description:
When clicking the "Entity Resolution" button in KnowledgeGraph,
encountered the following errors:
graphrag/entity_resolution.py
```
list(sub_connect_graph.nodes) AttributeError
```
graphrag/general/extractor.py
```
node0_attrs[attr] = sorted(set(node0_attrs[attr].extend(node1_attrs[attr])))
TypeError: 'NoneType' object is not iterable
```
```
for attr in ["keywords", "source_id"]:
KeyError I think attribute "keywords" is in edges not nodes
```
graphrag/utils.py
```
settings.docStoreConn.delete() # Sync function called as async
```
### Changes Made:
Fixed AttributeError in entity_resolution.py by properly handling graph
nodes
Fixed TypeError and KeyError in extractor.py by separate operations
Corrected async/sync mismatch in document deletion call
### What problem does this PR solve?
Introduced delete_knowledge_graph
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] Documentation Update
### What problem does this PR solve?
Removed set_entity and set_relation to avoid accessing doc engine during
graph computation.
Introduced GraphChange to avoid writing unchanged chunks.
### Type of change
- [x] Performance Improvement
### What problem does this PR solve?
Introduced [beartype](https://github.com/beartype/beartype) for runtime
type-checking.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
The default value of Python function parameters cannot be mutable.
Modifying this parameter inside the function will permanently change the
default value
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)