LightRAG

mirror of https://github.com/HKUDS/LightRAG.git synced 2025-11-09 06:13:47 +00:00

Author	SHA1	Message	Date
yangdx	3acb32f547	Add comments explaining chunk deduplication behavior in query context	2025-08-15 02:19:01 +08:00
yangdx	f733ac829c	Remove debug logging statements from query context building	2025-08-14 23:44:34 +08:00
yangdx	4a19d0de25	Add chunk tracking system to monitor chunk sources and frequencies • Track chunk sources (E/R/C types) • Log frequency and order metadata • Preserve chunk_id through processing • Add debug logging for chunk tracking • Handle rerank and truncation operations	2025-08-14 22:58:26 +08:00
yangdx	a8b7890470	Rename chunk selection functions for better clarity	2025-08-14 16:01:13 +08:00
yangdx	3343833571	Remove query params from cache key generation for keyword extration	2025-08-14 02:36:01 +08:00
yangdx	bac09118d5	Simplify embedding func extraction	2025-08-14 01:09:18 +08:00
yangdx	ac3b5605a1	Refactor logging for relation chunk discovery with dedup info	2025-08-14 00:41:58 +08:00
yangdx	edac10906c	fix: Add total_relation_chunks statistics and improve logging in _find_related_text_unit_from_relations	2025-08-13 23:45:31 +08:00
yangdx	5a40ff654e	Change KG chunk selection default to VECTOR - Set KG_CHUNK_PICK_METHOD default to VECTOR - Update env.example with new config option	2025-08-13 23:10:42 +08:00
yangdx	f1dafa0d01	feat: KG related chunks selection by vector similarity - Add env switch to toggle weighted polling vs vector-similarity strategy - Implement similarity-based sorting with fallback to weighted - Introduce batch vector read API for vector storage - Implement vector store and retrive funtion for Nanovector DB - Preserve default behavior (weighted polling selection method)	2025-08-13 18:16:42 +08:00
yangdx	095e0cbfa2	Refac: Add workspace infomation to all logger output for all storage type	2025-08-12 01:19:09 +08:00
yangdx	cf064579ce	Remove deprecated keyword extraction query methods - Delete query_with_keywords function - Remove kg_query_with_keywords helper - Drop separate keyword extraction methods	2025-08-08 14:59:39 +08:00
yangdx	eded6d1187	Unify document chunks context format in only_need_context query - Update Document Chunks label to include (DC) abbreviation	2025-08-08 00:02:53 +08:00
yangdx	0463963520	fix: include all query parameters in LLM cache hash key generation - Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions - Add queryparam field to CacheData structure and PostgreSQL storage for debugging - Update PostgreSQL schema with automatic migration for queryparam JSONB column - Prevent incorrect cache hits between queries with different parameters Fixes issue where different query parameters incorrectly shared the same cached results.	2025-08-05 18:03:10 +08:00
yangdx	cb75e6631e	Remove quantized embedding info from LLM cache - Delete quantize_embedding function - Delete dequantize_embedding function - Remove embedding fields from CacheData - Update save_to_cache to exclude embedding data - Clean up unused quantization-related code	2025-08-05 17:58:34 +08:00
yangdx	091f2b42c3	feat(performance): Optimize document deletion with entity/relation index - Introduces an index mapping documents to their corresponding entities and relations. This significantly speeds up `adelete_by_doc_id` by replacing slow graph traversal with a fast key-value lookup. - Refactors the ingestion pipeline (`merge_nodes_and_edges`) to populate this new index. Adds a one-time data migration script to backfill the index for existing data.	2025-08-03 09:19:02 +08:00
yangdx	32af45ff46	refactor: improve JSON parsing reliability with json-repair library Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers. - Remove locate_json_string_body_from_string() and convert_response_to_json() - Use json-repair.loads() in extract_keywords_only() for robust parsing - Clean up LLM interfaces and remove unused parameters - Add json-repair dependency	2025-08-01 19:36:20 +08:00
yangdx	598eecd06d	Refactor: Rename llm_model_max_token_size to summary_max_tokens This commit renames the parameter 'llm_model_max_token_size' to 'summary_max_tokens' for better clarity, as it specifically controls the token limit for entity relation summaries.	2025-07-28 00:49:08 +08:00
yangdx	3951a44666	Revert file_path build method, built from related chunks	2025-07-27 21:56:20 +08:00
yangdx	f2d051eea5	Fix: Improve keyword extraction prompt for robust JSON output. * Emphasize strict JSON output in key extration prompt * Clean up prompt examples in key extration prompt * Log raw LLM response on JSON error	2025-07-27 21:10:47 +08:00
yangdx	99e3812c38	refactor: unify file_path handling across merge and rebuild functions - Replace simple string concatenation with build_file_path() in: - _merge_edges_then_upsert - _rebuild_single_entity - _rebuild_single_relationship - Ensures consistent deduplication, length limiting, and error handling - Aligns with existing _merge_nodes_then_upsert implementation	2025-07-27 12:37:24 +08:00
yangdx	7b915b34f6	Refactor: move build_file_path function from operate.py to utils.py	2025-07-26 10:52:59 +08:00
yangdx	c8c3545454	refactor: extract file path length limit to shared constant • Add DEFAULT_MAX_FILE_PATH_LENGTH constant • Replace hardcoded 4090 in Milvus impl	2025-07-26 10:45:03 +08:00
yangdx	a943265257	fix: preserve file path order in build_file_path function	2025-07-26 10:21:32 +08:00
yangdx	6efa8ab263	Improve file path length warning message clarity and urgency • Change debug to warning level • Simplify message wording	2025-07-26 10:00:18 +08:00
xuewei	56c3cb2dbe	Improve build_file_path log	2025-07-26 08:38:02 +08:00
xuewei	b4da3de7d9	Improve file_path drop policy	2025-07-26 00:46:02 +08:00
yangdx	d78fda1d89	Optimize logger message	2025-07-24 04:31:06 +08:00
yangdx	3075691f72	Refactor: move reranking utilities from operate.py to utils.py • Move apply_rerank_if_enabled to utils • Move process_chunks_unified to utils	2025-07-24 03:33:38 +08:00
yangdx	5a5d32dc32	Optimize logger message	2025-07-24 02:13:39 +08:00
yangdx	42710221f5	Update log messages	2025-07-24 01:31:49 +08:00
yangdx	02f79508e0	Optimize context building with weighted polling and round-robin data selection	2025-07-24 01:18:21 +08:00
yangdx	7d96ca98f7	Fix linting	2025-07-23 16:16:37 +08:00
yangdx	6cc9411c86	fix: handle empty tasks list in merge_nodes_and_edges to prevent ValueError - Add empty tasks check before calling asyncio.wait() - Return early with logging when no entities/relationships to process	2025-07-23 16:06:47 +08:00
yangdx	2d41e5313a	Remove redundant tokenizer checks	2025-07-23 10:19:45 +08:00
yangdx	ce9dac9bcf	vdb does not store rank any more	2025-07-21 17:04:23 +08:00
yangdx	cb3bf3291c	Fix: rename rerank parameter from top_k to top_n The change aligns with the API parameter naming used by Jina and Cohere rerank services, ensuring consistency and clarity.	2025-07-20 00:26:27 +08:00
yangdx	7e3914052d	Optimize text chunk retrieval with batch fetching - Replace individual chunk fetches with batch get - Simplify deduplication logic - Improve error handling for missing data	2025-07-19 21:01:03 +08:00
xuewei	7acca59dfb	Improve query for find_text_unit	2025-07-19 17:27:28 +08:00
yangdx	cba97c62fe	Merge branch 'fix-memgraph' into fix-keyed-lock	2025-07-19 11:55:24 +08:00
yangdx	2d3a530ce8	Fix: Implemented entity-keyed locks for edge merging operations to ensure robust race condition protection - Replacing string concatenation with direct list passing for lock keys - Eliminating deadlock risks by removing the lock around node insertion within the edge merge	2025-07-19 11:48:19 +08:00
yangdx	9f5399c2f1	Replace tenacity retries with manual Memgraph transaction retries - Implement manual retry logic - Add exponential backoff with jitter - Improve error handling for transient errors	2025-07-19 11:31:21 +08:00
yangdx	6e1657a771	Improve thread safety for relationship rebuilding - Sort src and tgt for consistent lock keys - Maintain order-independent locking	2025-07-19 10:25:48 +08:00
yangdx	05bc5cfb64	Improve task execution with early failure detection - Add early failure detection for async tasks - Cancel pending tasks on first exception	2025-07-19 10:14:22 +08:00
yangdx	12d4f12e57	fix: sort edge_key components in _locked_process_edges for consistent locking - Ensures bidirectional relationships use same lock key - Maintains thread safety for knowledge graph edge operations	2025-07-19 07:36:50 +08:00
yangdx	be2d938c84	Fix file path handling in graph operations - Filter out empty file paths - Handle missing file_path fields	2025-07-17 18:33:14 +08:00
yangdx	7184c7b3ab	fix: change default edge weight from 0.0 to 1.0 in entity extraction and graph storage - Update extract_entities function in operate.py to use 1.0 as default weight - Fix Neo4j implementation to use 1.0 instead of 0.0 for missing edge weights - Fix Memgraph implementation to use 1.0 instead of 0.0 for missing edge weights - Ensures consistent non-zero default weights across all graph storage backends	2025-07-17 11:30:49 +08:00
yangdx	b1276a079f	Fix linting	2025-07-15 23:57:24 +08:00
yangdx	5f7cb437e8	Centralize query parameters into LightRAG class This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.	2025-07-15 23:56:49 +08:00
zrguo	3ead0489b8	Remove "rank", "weight", "keywords"	2025-07-15 21:47:33 +08:00

... 2 3 4 5 6 ...

589 Commits