501 Commits

Author SHA1 Message Date
yangdx
05bc5cfb64 Improve task execution with early failure detection
- Add early failure detection for async tasks
- Cancel pending tasks on first exception
2025-07-19 10:14:22 +08:00
yangdx
5f7cb437e8 Centralize query parameters into LightRAG class
This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.
2025-07-15 23:56:49 +08:00
yangdx
47341d3a71 Merge branch 'main' into rerank 2025-07-15 16:12:33 +08:00
yangdx
ccc2a20071 feat: remove deprecated MAX_TOKEN_SUMMARY parameter to prevent LLM output truncation
- Remove MAX_TOKEN_SUMMARY parameter and related configurations
- Eliminate forced token-based truncation in entity/relationship descriptions
- Switch to fragment-count based summarization logic using FORCE_LLM_SUMMARY_ON_MERGE
- Update FORCE_LLM_SUMMARY_ON_MERGE default from 6 to 4 for better summarization
- Clean up documentation, environment examples, and API display code
- Preserve backward compatibility by graceful parameter removal

This change resolves issues where LLMs were forcibly truncating entity relationship
descriptions mid-sentence, leading to incomplete and potentially inaccurate knowledge
graph content. The new approach allows LLMs to generate complete descriptions while
still providing summarization when multiple fragments need to be merged.

Breaking Change: None - parameter removal is backward compatible
Fixes: Entity relationship description truncation issues
2025-07-15 12:26:33 +08:00
zrguo
7c882313bb remove chunk_rerank_top_k 2025-07-15 11:52:34 +08:00
yangdx
b03bb48e24 feat: Refine summary logic and add dedicated Ollama num_ctx config
- Refactor the trigger condition for LLM-based summarization of entities and relations. Instead of relying on character length, the summary is now triggered when the number of merged description fragments exceeds a configured threshold. This provides a more robust and logical condition for consolidation.
- Introduce the `OLLAMA_NUM_CTX` environment variable to explicitly configure the context window size (`num_ctx`) for Ollama models. This decouples the model's context length from the `MAX_TOKENS` parameter, which is now specifically used to limit input for summary generation, making the configuration clearer and more flexible.
- Updated `README` files, `env.example`, and default values to reflect these changes.
2025-07-14 01:55:04 +08:00
yangdx
03b40937f7 Reduce embedding concurrency limit from 16 to 8 2025-07-13 03:13:52 +08:00
yangdx
39965d7ded Move merging stage back controled by max parallel insert semhore 2025-07-12 03:32:08 +08:00
yangdx
3afdd1b67c Fix initial count error for multi-process lock with key 2025-07-11 20:39:08 +08:00
yangdx
c47747da9e Merge branch 'main' into merge_lock_with_key 2025-07-11 16:37:10 +08:00
yangdx
ef4870fda5 Combined entity and edge processing tasks and optimize merging with semaphore 2025-07-11 16:34:54 +08:00
yangdx
9aa2ed0837 Merge branch 'main' into rerank 2025-07-09 15:33:39 +08:00
yangdx
207f0a7f2a Merge branch 'main' into merge_lock_with_key 2025-07-09 09:25:28 +08:00
yangdx
cb3bfc0e5b Release semphore before merge stage 2025-07-09 09:24:44 +08:00
Anton Vice
b192f8c9a3 Fix: Handle NoneType error when processing documents without a file path
The document processing pipeline would crash with a TypeError when a document was submitted as raw text via the API, as the file_path attribute would be None. This change adds a check to handle the None case gracefully, preventing the crash and allowing text-based documents to be indexed correctly.
2025-07-08 19:35:22 -03:00
zrguo
71cb3adb4f Merge branch 'main' into rerank 2025-07-08 15:10:23 +08:00
yangdx
56d43de58a Merge branch 'main' into merge_lock_with_key 2025-07-08 12:46:31 +08:00
zrguo
f5c80d7cde Simplify Configuration 2025-07-08 11:16:34 +08:00
yangdx
9b7b2a9b0f Reduce default embedding batch size from 32 to 10 2025-07-08 11:00:09 +08:00
zrguo
75dd4f3498 add rerank model 2025-07-07 22:44:59 +08:00
yangdx
ef79088f60 Move max_graph_nodes to global config 2025-07-07 21:53:57 +08:00
yangdx
033098c1bc Feat: Add WORKSPACE support to all storage types 2025-07-07 00:57:21 +08:00
yangdx
1b2d295a4f Remove namespace_prefix 2025-07-06 00:16:47 +08:00
yangdx
98150e80b8 Improved empty/whitespace file handling
- Better detection of whitespace-only files
- Changed error to warning for empty chunks
2025-07-05 23:16:39 +08:00
xuewei
648a87653f 文本块是空白 2025-07-05 14:28:42 +08:00
yangdx
a9e10ae810 Update logger messages 2025-07-03 14:08:19 +08:00
yangdx
e56734cb8b Refac: Optimize document deletion performance
- Adding chunks_list to  dock_status
- Adding  llm_cache_list to text_chunks
- Implemented storage types: JsonKV and  Redis
2025-07-03 04:18:25 +08:00
zrguo
479865a271 Add max_gleaning to env 2025-07-01 17:13:33 +08:00
yangdx
e70f5a35e5 Refac: Add KG rebuild logging with pipeline status
- Logs detailed progress, including warnings and failures, to the pipeline status.
- Adds counters to report the total number of successfully rebuilt entities and relationships upon completion.
2025-06-29 21:27:12 +08:00
yangdx
3a8a99b73d feat(postgres): Implement text_chunks upsert for PGKVStorage 2025-06-28 14:37:35 +08:00
yangdx
8fb1c09b08 Refac: pipelinge message 2025-06-26 01:00:54 +08:00
yangdx
bdcd55a871 Feat: Add delete upload file option to document deletion 2025-06-25 19:02:46 +08:00
yangdx
51bb0471cd Change the API for deleting documents to support deleting multiple documents at once. 2025-06-25 16:19:49 +08:00
yangdx
495d6c8cce Improve the pipeline status message for document deletetion 2025-06-25 15:46:58 +08:00
yangdx
2aaa6d5f7d Fix linting 2025-06-25 14:59:45 +08:00
yangdx
8a365533d7 Add comprehensive error handling for document deletion 2025-06-25 14:58:41 +08:00
yangdx
da46b341dc feat: Optimize document deletion performance
- To enhance performance during document deletion, new batch-get methods, `get_nodes_by_chunk_ids` and `get_edges_by_chunk_ids`, have been added to the graph storage layer (`BaseGraphStorage` and its implementations). The [`adelete_by_doc_id`](lightrag/lightrag.py:1681) function now leverages these methods to avoid unnecessary iteration over the entire knowledge graph, significantly improving efficiency.
- Graph storage updated: Networkx, Neo4j, Postgres AGE
2025-06-25 12:37:57 +08:00
yangdx
2946bbdb71 Add TODO: There is performance when iterating get_all_labels 2025-06-24 11:32:28 +08:00
yangdx
e6baffe10c Add retrun status to entity and relation delete operations 2025-06-23 21:39:45 +08:00
yangdx
bd487dd252 Unify document APIs returen status string 2025-06-23 21:38:47 +08:00
yangdx
ce50135efb Improved docstring for document deletion method 2025-06-23 21:08:51 +08:00
yangdx
ebcabe29ca Remove duplicated graph db lock 2025-06-23 18:46:01 +08:00
yangdx
5099ac8213 Fix linting 2025-06-23 18:41:30 +08:00
yangdx
a215939c41 Refac: Avoid duplicate edge processing in adelete_by_doc_id 2025-06-23 18:39:36 +08:00
yangdx
a0be65d5d9 Refac: Return status and messages for delete by doc id operaton 2025-06-23 17:59:27 +08:00
yangdx
9fae0eadff feat: Ensure thread safety for graph write operations
Add a lock to delete, adelete_by_entity, and adelete_by_relation methods to prevent race conditions and ensure data consistency during concurrent modifications to the knowledge graph.
2025-06-23 09:57:56 +08:00
zrguo
4937de8809 Update 2025-06-22 15:12:09 +08:00
zrguo
3abdc42549 Merge branch 'main' into delete_doc 2025-06-16 17:02:21 +08:00
kwilt
09cbcc4572 fix typo: "extrat" -> extract 2025-06-09 08:28:14 -05:00
zrguo
ead82a8dbd update delete_by_doc_id 2025-06-09 18:52:34 +08:00