LightRAG

mirror of https://github.com/HKUDS/LightRAG.git synced 2025-11-10 06:44:05 +00:00

Author	SHA1	Message	Date
yangdx	cb75e6631e	Remove quantized embedding info from LLM cache - Delete quantize_embedding function - Delete dequantize_embedding function - Remove embedding fields from CacheData - Update save_to_cache to exclude embedding data - Clean up unused quantization-related code	2025-08-05 17:58:34 +08:00
yangdx	091f2b42c3	feat(performance): Optimize document deletion with entity/relation index - Introduces an index mapping documents to their corresponding entities and relations. This significantly speeds up `adelete_by_doc_id` by replacing slow graph traversal with a fast key-value lookup. - Refactors the ingestion pipeline (`merge_nodes_and_edges`) to populate this new index. Adds a one-time data migration script to backfill the index for existing data.	2025-08-03 09:19:02 +08:00
yangdx	32af45ff46	refactor: improve JSON parsing reliability with json-repair library Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers. - Remove locate_json_string_body_from_string() and convert_response_to_json() - Use json-repair.loads() in extract_keywords_only() for robust parsing - Clean up LLM interfaces and remove unused parameters - Add json-repair dependency	2025-08-01 19:36:20 +08:00
yangdx	598eecd06d	Refactor: Rename llm_model_max_token_size to summary_max_tokens This commit renames the parameter 'llm_model_max_token_size' to 'summary_max_tokens' for better clarity, as it specifically controls the token limit for entity relation summaries.	2025-07-28 00:49:08 +08:00
yangdx	3951a44666	Revert file_path build method, built from related chunks	2025-07-27 21:56:20 +08:00
yangdx	f2d051eea5	Fix: Improve keyword extraction prompt for robust JSON output. * Emphasize strict JSON output in key extration prompt * Clean up prompt examples in key extration prompt * Log raw LLM response on JSON error	2025-07-27 21:10:47 +08:00
yangdx	99e3812c38	refactor: unify file_path handling across merge and rebuild functions - Replace simple string concatenation with build_file_path() in: - _merge_edges_then_upsert - _rebuild_single_entity - _rebuild_single_relationship - Ensures consistent deduplication, length limiting, and error handling - Aligns with existing _merge_nodes_then_upsert implementation	2025-07-27 12:37:24 +08:00
yangdx	7b915b34f6	Refactor: move build_file_path function from operate.py to utils.py	2025-07-26 10:52:59 +08:00
yangdx	c8c3545454	refactor: extract file path length limit to shared constant • Add DEFAULT_MAX_FILE_PATH_LENGTH constant • Replace hardcoded 4090 in Milvus impl	2025-07-26 10:45:03 +08:00
yangdx	a943265257	fix: preserve file path order in build_file_path function	2025-07-26 10:21:32 +08:00
yangdx	6efa8ab263	Improve file path length warning message clarity and urgency • Change debug to warning level • Simplify message wording	2025-07-26 10:00:18 +08:00
xuewei	56c3cb2dbe	Improve build_file_path log	2025-07-26 08:38:02 +08:00
xuewei	b4da3de7d9	Improve file_path drop policy	2025-07-26 00:46:02 +08:00
yangdx	d78fda1d89	Optimize logger message	2025-07-24 04:31:06 +08:00
yangdx	3075691f72	Refactor: move reranking utilities from operate.py to utils.py • Move apply_rerank_if_enabled to utils • Move process_chunks_unified to utils	2025-07-24 03:33:38 +08:00
yangdx	5a5d32dc32	Optimize logger message	2025-07-24 02:13:39 +08:00
yangdx	42710221f5	Update log messages	2025-07-24 01:31:49 +08:00
yangdx	02f79508e0	Optimize context building with weighted polling and round-robin data selection	2025-07-24 01:18:21 +08:00
yangdx	7d96ca98f7	Fix linting	2025-07-23 16:16:37 +08:00
yangdx	6cc9411c86	fix: handle empty tasks list in merge_nodes_and_edges to prevent ValueError - Add empty tasks check before calling asyncio.wait() - Return early with logging when no entities/relationships to process	2025-07-23 16:06:47 +08:00
yangdx	2d41e5313a	Remove redundant tokenizer checks	2025-07-23 10:19:45 +08:00
yangdx	ce9dac9bcf	vdb does not store rank any more	2025-07-21 17:04:23 +08:00
yangdx	cb3bf3291c	Fix: rename rerank parameter from top_k to top_n The change aligns with the API parameter naming used by Jina and Cohere rerank services, ensuring consistency and clarity.	2025-07-20 00:26:27 +08:00
yangdx	7e3914052d	Optimize text chunk retrieval with batch fetching - Replace individual chunk fetches with batch get - Simplify deduplication logic - Improve error handling for missing data	2025-07-19 21:01:03 +08:00
xuewei	7acca59dfb	Improve query for find_text_unit	2025-07-19 17:27:28 +08:00
yangdx	cba97c62fe	Merge branch 'fix-memgraph' into fix-keyed-lock	2025-07-19 11:55:24 +08:00
yangdx	2d3a530ce8	Fix: Implemented entity-keyed locks for edge merging operations to ensure robust race condition protection - Replacing string concatenation with direct list passing for lock keys - Eliminating deadlock risks by removing the lock around node insertion within the edge merge	2025-07-19 11:48:19 +08:00
yangdx	9f5399c2f1	Replace tenacity retries with manual Memgraph transaction retries - Implement manual retry logic - Add exponential backoff with jitter - Improve error handling for transient errors	2025-07-19 11:31:21 +08:00
yangdx	6e1657a771	Improve thread safety for relationship rebuilding - Sort src and tgt for consistent lock keys - Maintain order-independent locking	2025-07-19 10:25:48 +08:00
yangdx	05bc5cfb64	Improve task execution with early failure detection - Add early failure detection for async tasks - Cancel pending tasks on first exception	2025-07-19 10:14:22 +08:00
yangdx	12d4f12e57	fix: sort edge_key components in _locked_process_edges for consistent locking - Ensures bidirectional relationships use same lock key - Maintains thread safety for knowledge graph edge operations	2025-07-19 07:36:50 +08:00
yangdx	be2d938c84	Fix file path handling in graph operations - Filter out empty file paths - Handle missing file_path fields	2025-07-17 18:33:14 +08:00
yangdx	7184c7b3ab	fix: change default edge weight from 0.0 to 1.0 in entity extraction and graph storage - Update extract_entities function in operate.py to use 1.0 as default weight - Fix Neo4j implementation to use 1.0 instead of 0.0 for missing edge weights - Fix Memgraph implementation to use 1.0 instead of 0.0 for missing edge weights - Ensures consistent non-zero default weights across all graph storage backends	2025-07-17 11:30:49 +08:00
yangdx	b1276a079f	Fix linting	2025-07-15 23:57:24 +08:00
yangdx	5f7cb437e8	Centralize query parameters into LightRAG class This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.	2025-07-15 23:56:49 +08:00
zrguo	3ead0489b8	Remove "rank", "weight", "keywords"	2025-07-15 21:47:33 +08:00
zrguo	1541034816	Add DEFAULT_RELATED_CHUNK_NUMBER	2025-07-15 21:35:12 +08:00
zrguo	42f1fd60f4	Update operate.py	2025-07-15 18:59:52 +08:00
zrguo	29e82723e6	Update operate.py	2025-07-15 18:57:57 +08:00
yangdx	1927cb2685	Fix linting	2025-07-15 17:24:57 +08:00
yangdx	47341d3a71	Merge branch 'main' into rerank	2025-07-15 16:12:33 +08:00
yangdx	e8e1f6ab56	feat: centralize environment variable defaults in constants.py	2025-07-15 16:11:50 +08:00
yangdx	ccc2a20071	feat: remove deprecated MAX_TOKEN_SUMMARY parameter to prevent LLM output truncation - Remove MAX_TOKEN_SUMMARY parameter and related configurations - Eliminate forced token-based truncation in entity/relationship descriptions - Switch to fragment-count based summarization logic using FORCE_LLM_SUMMARY_ON_MERGE - Update FORCE_LLM_SUMMARY_ON_MERGE default from 6 to 4 for better summarization - Clean up documentation, environment examples, and API display code - Preserve backward compatibility by graceful parameter removal This change resolves issues where LLMs were forcibly truncating entity relationship descriptions mid-sentence, leading to incomplete and potentially inaccurate knowledge graph content. The new approach allows LLMs to generate complete descriptions while still providing summarization when multiple fragments need to be merged. Breaking Change: None - parameter removal is backward compatible Fixes: Entity relationship description truncation issues	2025-07-15 12:26:33 +08:00
zrguo	7c882313bb	remove chunk_rerank_top_k	2025-07-15 11:52:34 +08:00
zrguo	86a0a4872e	Update operate.py	2025-07-15 10:56:48 +08:00
zrguo	7edf087baa	Update operate.py	2025-07-14 18:43:22 +08:00
zrguo	bbd91d3a18	Update operate.py	2025-07-14 16:37:25 +08:00
zrguo	4e425b1b59	Revert "update from main" This reverts commit 1d0376d6a926ef60d641af4406dacf5b8bbb430f.	2025-07-14 16:29:00 +08:00
zrguo	1d0376d6a9	update from main	2025-07-14 16:27:49 +08:00
zrguo	c9cbd2d3e0	Merge branch 'main' into rerank	2025-07-14 16:24:29 +08:00

1 2 3 4 5 ...

425 Commits