LightRAG

mirror of https://github.com/HKUDS/LightRAG.git synced 2025-11-11 07:13:51 +00:00

Author	SHA1	Message	Date
yangdx	0a62f02e84	Improve edge logging format and exception prefixes	2025-09-06 08:35:52 +08:00
yangdx	6be462511f	Add error prefixing for better debugging context in async operations * Add create_prefixed_exception utility * Prefix entity processing errors * Prefix relationship processing errors * Prefix chunk extraction progress info * Maintain original exception chains	2025-09-05 21:28:00 +08:00
yangdx	385668dec5	Fix malformed tuple delimiters in extraction result processing	2025-09-05 17:14:42 +08:00
yangdx	83b54975a2	fix: resolve "Task exception was never retrieved" warnings in async task handling - Handle multiple simultaneous exceptions correctly - Maintain fast-fail behavior while ensuring proper exception cleanup to prevent asyncio warnings	2025-09-04 12:40:41 +08:00
yangdx	7ef2f0dff6	Add VDB error handling with retries for data consistency - Add safe_vdb_operation_with_exception util - Wrap VDB ops in entity/relationship code - Ensure exceptions propagate on failure - Add retry logic with configurable delays	2025-09-03 21:15:09 +08:00
yangdx	c86f863fa4	feat: optimize entity extraction for smaller LLMs Simplify entity relationship extraction process to improve compatibility and performance with smaller, less capable language models. Changes: - Remove iterative gleaning loop with LLM-based continuation decisions - Simplify to single gleaning pass when entity_extract_max_gleaning > 0 - Streamline entity extraction prompts with clearer instructions - Add explicit completion delimiter signals in all examples	2025-09-03 10:33:01 +08:00
yangdx	5b2deccbef	Improve text normalization and add entity type capitalization - Capitalize entity types with .title() - Add non-breaking space handling - Add narrow non-breaking space regex	2025-09-02 02:51:41 +08:00
yangdx	3f8a9abe7e	Refactor extraction result processing to reduce code duplication • Extract shared processing logic • Add delimiter pattern fixes • Improve bracket standardization	2025-09-02 01:22:29 +08:00
yangdx	3cdc98f366	Improve extraction parsing with better bracket handling and delimiter fixes • Standardize Chinese/English brackets • Fix incomplete tuple delimiters • Remove duplicate delimiter fix code • Support mixed bracket formats • Enhance record parsing robustness	2025-09-02 00:26:04 +08:00
yangdx	8bbf307aeb	Fix regex to match multiline content in extraction parsing • Remove non-greedy quantifier • Add DOTALL flag for multiline matching • Apply to both parsing functions • Enable cross-line content extraction	2025-09-01 10:35:06 +08:00
yangdx	7baeb186c6	Fix regex to use non-greedy matching for parentheses extraction	2025-09-01 10:10:45 +08:00
Tong Da	dc7ce98c7e	Add search interface to lightrag.	2025-09-01 02:40:40 +08:00
Tong Da	14fe3e4387	remove unused import	2025-09-01 02:24:56 +08:00
Tong Da	a60a8704ba	Add search method to lightrag. Search is for retrieve structured objects (entities, relations, chunks) in their raw data format.	2025-09-01 02:19:58 +08:00
yangdx	5fd7682f16	Fix LLM output instability for <\|> tuple delimiter - Replace <\|\|> with <\|> - Replace < \| > with <\|> - Apply fix in both functions - Handle delimiter variations - Improve parsing reliability	2025-09-01 01:22:27 +08:00
yangdx	4e751e0653	refac: Enhance extraction with improved prompts and parser - Prompts: Restructured prompts with clearer steps and quality guidelines. Simplified the relationship tuple by removing `relationship_strength` - Model: Updated default entity types to be more comprehensive and consistently capitalized (e.g., `Location`, `Product`)	2025-08-31 22:24:11 +08:00
yangdx	75de40da41	Fix typo in relationship extraction log messages	2025-08-31 17:45:16 +08:00
yangdx	97c9600085	Improve extraction error handling and field validation • Add field count validation warnings • Fix relationship field count (5→6) • Change error logs to warnings	2025-08-31 17:33:42 +08:00
yangdx	b747417961	feat: enhance text extraction text sanitization and normalization - Improve reduntant quotes in entity and relation name, type and keywords - Add HTML tag cleaning and Chinese symbol conversion - Filter out short numeric content and malformed text - Enhance entity type validation with character filtering	2025-08-31 13:17:20 +08:00
yangdx	d4bbc5dea9	refactor: Merge multi-step text sanitization into single function	2025-08-31 10:36:56 +08:00
yangdx	03d0fa3014	perf: add optional query_embedding parameter to avoid redundant embedding calls	2025-08-29 18:15:45 +08:00
yangdx	a923d378dd	Remove deprecated ID-based filtering from vector storage queries - Remove ids param from QueryParam - Simplify BaseVectorStorage.query signature - Update all vector storage implementations - Streamline PostgreSQL query templates - Remove ID filtering from operate.py calls	2025-08-29 17:06:48 +08:00
yangdx	99e28e815b	fix: prevent document processing failures from UTF-8 surrogate characters - Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders - Add strict UTF-8 cleaning pipeline to entity/relationship extraction - Skip problematic entities/relationships instead of corrupting data Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)	2025-08-27 23:52:39 +08:00
yangdx	6a2a592224	Fix linting	2025-08-27 12:51:50 +08:00
yangdx	28e07c89f9	Fix linting	2025-08-27 12:35:51 +08:00
yangdx	2ccc39de9a	Fix language fallback in summarize error	2025-08-27 12:34:27 +08:00
yangdx	ff0a18e08c	Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method	2025-08-27 12:23:22 +08:00
Thibo Rosemplatt	c3aabfc251	Merge branch 'main' into entityTypesServerSupport	2025-08-26 21:48:20 +02:00
yangdx	d3623cc9ae	fix: resolve infinite loop risk in _handle_entity_relation_summary - Ensure oversized descriptions are force-merged with subsequent ones - Add len(current_list) <= 2 termination condition to guarantee convergence - Implement token-based truncation in _summarize_descriptions to prevent overflow	2025-08-26 21:58:31 +08:00
yangdx	79e0226b2b	Refactor: move force_llm_summary_on_merge to global_config access - Remove parameter from function signature - Access from global_config instead - Improve code consistency	2025-08-26 18:02:39 +08:00
yangdx	6bcfe696ee	feat: add output length recommendation and description type to LLM summary - Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens) - Optimize prompt temple for LLM summary	2025-08-26 14:41:12 +08:00
yangdx	025f70089a	Simplify status messages in knowledge rebuild operations	2025-08-26 04:26:15 +08:00
yangdx	9eb2be79b8	feat: track actual LLM usage in entity/relation merging - Modified _handle_entity_relation_summary to return tuple[str, bool] - Updated merge functions to log "LLMmerg" vs "Merging" based on actual LLM usage - Replaced hardcoded fragment count prediction with real-time LLM usage tracking	2025-08-26 03:56:18 +08:00
yangdx	de2daf6565	refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration - Update algorithm logic in operate.py for better token management - Fix health endpoint to use correct parameter names	2025-08-26 01:35:50 +08:00
yangdx	91767ffcee	Improve warning message formatting in entity/relationship rebuild	2025-08-25 21:55:29 +08:00
yangdx	15cdd0dd8f	fix: Sort cached extraction results by the create_time within each chunk This ensures the KG rebuilds maintain the original creation order of the first extraction result for each chunk.	2025-08-25 21:41:33 +08:00
yangdx	882d6857d8	feat: Implement map-reduce summarization to handle large humber of description merging	2025-08-25 21:03:16 +08:00
yangdx	cac8e189e7	Remove redundant entity vector deletion before upsert	2025-08-25 17:18:51 +08:00
yangdx	9b6de7512d	Optimize the stability of description merging order	2025-08-25 17:10:51 +08:00
yangdx	31f4f96944	Exclude conversation history from context length calculation	2025-08-25 12:43:34 +08:00
yangdx	f688e95f56	Add warning for vector chunks missing chunk_id	2025-08-25 12:42:25 +08:00
yangdx	b6aedba7ae	Add logging for empty naive query results in vector context	2025-08-25 12:21:31 +08:00
yangdx	f1ff5cf93f	fix: initialize truncated_chunks variable in _build_query_context Prevents local variable 'truncated_chunks'referenced before assignment	2025-08-25 11:56:56 +08:00
Thibo Rosemplatt	d054ec5d00	Added entity_types as a user defined variable (via .env)	2025-08-23 20:16:11 +02:00
yangdx	9bc349ddd6	Improve Empty Keyword Handling logic	2025-08-23 11:50:58 +08:00
yangdx	b5c230abdd	optimize: avoid duplicate embedding calls in _build_query_context Reduces API costs and improves query performance while maintaining backward compatibility.	2025-08-21 16:49:24 +08:00
yangdx	2a7fec2873	Optimize keyword extraction prompt, and remove conversation history from keywork extraction. - Remove history context processing - Update prompt to focus on single query - Clarify high/low level keyword types - Improve JSON output instructions - Add edge case handling guidance	2025-08-18 23:35:04 +08:00
yangdx	d3fde60938	refactor: remove file_path and created_at from context, improve token truncation - Remove file_path and created_at fields from entity and relationship contexts - Update token truncation to include full JSON serialization instead of content only	2025-08-18 18:30:09 +08:00
yangdx	1e2d5252d7	Add get_vectors_by_ids method and filter out vector data from query results	2025-08-15 16:32:26 +08:00
yangdx	6cab68bb47	Improve KG chunk selection documentation and configuration clarity	2025-08-15 10:09:44 +08:00

1 2 3 4 5 ...

589 Commits