396 Commits

Author SHA1 Message Date
yangdx
05bc5cfb64 Improve task execution with early failure detection
- Add early failure detection for async tasks
- Cancel pending tasks on first exception
2025-07-19 10:14:22 +08:00
yangdx
12d4f12e57 fix: sort edge_key components in _locked_process_edges for consistent locking
- Ensures bidirectional relationships use same lock key
- Maintains thread safety for knowledge graph edge operations
2025-07-19 07:36:50 +08:00
yangdx
be2d938c84 Fix file path handling in graph operations
- Filter out empty file paths
- Handle missing file_path fields
2025-07-17 18:33:14 +08:00
yangdx
7184c7b3ab fix: change default edge weight from 0.0 to 1.0 in entity extraction and graph storage
- Update extract_entities function in operate.py to use 1.0 as default weight
- Fix Neo4j implementation to use 1.0 instead of 0.0 for missing edge weights
- Fix Memgraph implementation to use 1.0 instead of 0.0 for missing edge weights
- Ensures consistent non-zero default weights across all graph storage backends
2025-07-17 11:30:49 +08:00
yangdx
b1276a079f Fix linting 2025-07-15 23:57:24 +08:00
yangdx
5f7cb437e8 Centralize query parameters into LightRAG class
This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.
2025-07-15 23:56:49 +08:00
zrguo
3ead0489b8 Remove "rank", "weight", "keywords" 2025-07-15 21:47:33 +08:00
zrguo
1541034816 Add DEFAULT_RELATED_CHUNK_NUMBER 2025-07-15 21:35:12 +08:00
zrguo
42f1fd60f4 Update operate.py 2025-07-15 18:59:52 +08:00
zrguo
29e82723e6 Update operate.py 2025-07-15 18:57:57 +08:00
yangdx
1927cb2685 Fix linting 2025-07-15 17:24:57 +08:00
yangdx
47341d3a71 Merge branch 'main' into rerank 2025-07-15 16:12:33 +08:00
yangdx
e8e1f6ab56 feat: centralize environment variable defaults in constants.py 2025-07-15 16:11:50 +08:00
yangdx
ccc2a20071 feat: remove deprecated MAX_TOKEN_SUMMARY parameter to prevent LLM output truncation
- Remove MAX_TOKEN_SUMMARY parameter and related configurations
- Eliminate forced token-based truncation in entity/relationship descriptions
- Switch to fragment-count based summarization logic using FORCE_LLM_SUMMARY_ON_MERGE
- Update FORCE_LLM_SUMMARY_ON_MERGE default from 6 to 4 for better summarization
- Clean up documentation, environment examples, and API display code
- Preserve backward compatibility by graceful parameter removal

This change resolves issues where LLMs were forcibly truncating entity relationship
descriptions mid-sentence, leading to incomplete and potentially inaccurate knowledge
graph content. The new approach allows LLMs to generate complete descriptions while
still providing summarization when multiple fragments need to be merged.

Breaking Change: None - parameter removal is backward compatible
Fixes: Entity relationship description truncation issues
2025-07-15 12:26:33 +08:00
zrguo
7c882313bb remove chunk_rerank_top_k 2025-07-15 11:52:34 +08:00
zrguo
86a0a4872e Update operate.py 2025-07-15 10:56:48 +08:00
zrguo
7edf087baa Update operate.py 2025-07-14 18:43:22 +08:00
zrguo
bbd91d3a18 Update operate.py 2025-07-14 16:37:25 +08:00
zrguo
4e425b1b59 Revert "update from main"
This reverts commit 1d0376d6a926ef60d641af4406dacf5b8bbb430f.
2025-07-14 16:29:00 +08:00
zrguo
1d0376d6a9 update from main 2025-07-14 16:27:49 +08:00
zrguo
c9cbd2d3e0 Merge branch 'main' into rerank 2025-07-14 16:24:29 +08:00
zrguo
ef2115d437 Update token limit 2025-07-14 15:53:48 +08:00
yangdx
b03bb48e24 feat: Refine summary logic and add dedicated Ollama num_ctx config
- Refactor the trigger condition for LLM-based summarization of entities and relations. Instead of relying on character length, the summary is now triggered when the number of merged description fragments exceeds a configured threshold. This provides a more robust and logical condition for consolidation.
- Introduce the `OLLAMA_NUM_CTX` environment variable to explicitly configure the context window size (`num_ctx`) for Ollama models. This decouples the model's context length from the `MAX_TOKENS` parameter, which is now specifically used to limit input for summary generation, making the configuration clearer and more flexible.
- Updated `README` files, `env.example`, and default values to reflect these changes.
2025-07-14 01:55:04 +08:00
yangdx
f185b3fb38 Optimize async task limits for graph processing
- Increased concurrency for graph operations
- Renamed variables for clarity
- Updated status messages
2025-07-13 21:51:19 +08:00
yangdx
e4bf4d19a0 Optimize knowledge graph rebuild with parallel processing
- Add parallel processing for KG rebuild
- Implement keyed locks for data consistency
2025-07-12 13:22:56 +08:00
yangdx
a85d7054d4 fix: move node existence check inside lock to prevent race condition
Move knowledge_graph_inst.has_node check inside get_storage_keyed_lock
in _merge_edges_then_upsert to ensure atomic check-then-act operations
and prevent duplicate node creation during concurrent updates.
2025-07-12 12:22:32 +08:00
yangdx
2ade3067f8 Refac: Generalize keyed lock with namespace support
Refactored the `KeyedUnifiedLock` to be generic and support dynamic namespaces. This decouples the locking mechanism from a specific "GraphDB" implementation, allowing it to be reused across different components and workspaces safely.

Key changes:
- `KeyedUnifiedLock` now takes a `namespace` parameter on lock acquisition.
- Renamed `_graph_db_lock_keyed` to a more generic _storage_keyed_lock`
- Replaced `get_graph_db_lock_keyed` with get_storage_keyed_lock` to support namespaces
2025-07-12 12:10:12 +08:00
yangdx
22c36f2fd2 Optimize log messages 2025-07-12 02:41:31 +08:00
yangdx
c47747da9e Merge branch 'main' into merge_lock_with_key 2025-07-11 16:37:10 +08:00
yangdx
ef4870fda5 Combined entity and edge processing tasks and optimize merging with semaphore 2025-07-11 16:34:54 +08:00
zrguo
b0479c078a fix process_chunks_unified() 2025-07-09 15:55:38 +08:00
zrguo
e1541caea9 Update webui setting 2025-07-09 12:10:06 +08:00
yangdx
207f0a7f2a Merge branch 'main' into merge_lock_with_key 2025-07-09 09:25:28 +08:00
yangdx
cb3bfc0e5b Release semphore before merge stage 2025-07-09 09:24:44 +08:00
yangdx
e9c3503f77 Update logger info 2025-07-09 04:36:52 +08:00
yangdx
5d4484882a Merge branch 'main' into rerank 2025-07-09 03:59:04 +08:00
zrguo
c295d355a0 fix chunk_top_k limiting 2025-07-08 15:05:30 +08:00
SLKun
5f330ec11a remove <think> tag for entities and keywords extraction 2025-07-08 14:59:15 +08:00
zrguo
04a57445da update chunks truncation method 2025-07-08 13:31:05 +08:00
yangdx
56d43de58a Merge branch 'main' into merge_lock_with_key 2025-07-08 12:46:31 +08:00
zrguo
f5c80d7cde Simplify Configuration 2025-07-08 11:16:34 +08:00
zrguo
75dd4f3498 add rerank model 2025-07-07 22:44:59 +08:00
yangdx
fe13475234 Fix linting 2025-07-05 12:07:37 +08:00
yangdx
a2e59dd078 fix: prevent empty entity names after normalization in extraction
Added validation checks in entity and relationship extraction functions to filter out entities that become empty strings after normalize_extracted_info processing. This prevents empty labels from appearing in get_all_labels() results and maintains knowledge graph data integrity.
2025-07-05 12:06:34 +08:00
yangdx
6c2ae40d7d Refac: Enhance KG rebuild stability by incorporating create_time into the LLM cache 2025-07-03 17:08:29 +08:00
yangdx
6b6d14bc3a fix: Deduplicate entities and relationships in a single chunk with multiple gleaning results during KG rebuild 2025-07-03 13:47:52 +08:00
yangdx
e56734cb8b Refac: Optimize document deletion performance
- Adding chunks_list to  dock_status
- Adding  llm_cache_list to text_chunks
- Implemented storage types: JsonKV and  Redis
2025-07-03 04:18:25 +08:00
yangdx
271722405f feat: Flatten LLM cache structure for improved recall efficiency
Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.
2025-07-02 16:11:53 +08:00
yangdx
e70f5a35e5 Refac: Add KG rebuild logging with pipeline status
- Logs detailed progress, including warnings and failures, to the pipeline status.
- Adds counters to report the total number of successfully rebuilt entities and relationships upon completion.
2025-06-29 21:27:12 +08:00
yangdx
8522bfc9dc Optimied logger info 2025-06-28 19:27:36 +08:00