yangdx
0a62f02e84
Improve edge logging format and exception prefixes
2025-09-06 08:35:52 +08:00
yangdx
6be462511f
Add error prefixing for better debugging context in async operations
...
* Add create_prefixed_exception utility
* Prefix entity processing errors
* Prefix relationship processing errors
* Prefix chunk extraction progress info
* Maintain original exception chains
2025-09-05 21:28:00 +08:00
yangdx
385668dec5
Fix malformed tuple delimiters in extraction result processing
2025-09-05 17:14:42 +08:00
yangdx
83b54975a2
fix: resolve "Task exception was never retrieved" warnings in async task handling
...
- Handle multiple simultaneous exceptions correctly
- Maintain fast-fail behavior while ensuring proper exception cleanup to
prevent asyncio warnings
2025-09-04 12:40:41 +08:00
yangdx
7ef2f0dff6
Add VDB error handling with retries for data consistency
...
- Add safe_vdb_operation_with_exception util
- Wrap VDB ops in entity/relationship code
- Ensure exceptions propagate on failure
- Add retry logic with configurable delays
2025-09-03 21:15:09 +08:00
yangdx
c86f863fa4
feat: optimize entity extraction for smaller LLMs
...
Simplify entity relationship extraction process to improve compatibility
and performance with smaller, less capable language models.
Changes:
- Remove iterative gleaning loop with LLM-based continuation decisions
- Simplify to single gleaning pass when entity_extract_max_gleaning > 0
- Streamline entity extraction prompts with clearer instructions
- Add explicit completion delimiter signals in all examples
2025-09-03 10:33:01 +08:00
yangdx
5b2deccbef
Improve text normalization and add entity type capitalization
...
- Capitalize entity types with .title()
- Add non-breaking space handling
- Add narrow non-breaking space regex
2025-09-02 02:51:41 +08:00
yangdx
3f8a9abe7e
Refactor extraction result processing to reduce code duplication
...
• Extract shared processing logic
• Add delimiter pattern fixes
• Improve bracket standardization
2025-09-02 01:22:29 +08:00
yangdx
3cdc98f366
Improve extraction parsing with better bracket handling and delimiter fixes
...
• Standardize Chinese/English brackets
• Fix incomplete tuple delimiters
• Remove duplicate delimiter fix code
• Support mixed bracket formats
• Enhance record parsing robustness
2025-09-02 00:26:04 +08:00
yangdx
8bbf307aeb
Fix regex to match multiline content in extraction parsing
...
• Remove non-greedy quantifier
• Add DOTALL flag for multiline matching
• Apply to both parsing functions
• Enable cross-line content extraction
2025-09-01 10:35:06 +08:00
yangdx
7baeb186c6
Fix regex to use non-greedy matching for parentheses extraction
2025-09-01 10:10:45 +08:00
Tong Da
dc7ce98c7e
Add search interface to lightrag.
2025-09-01 02:40:40 +08:00
Tong Da
14fe3e4387
remove unused import
2025-09-01 02:24:56 +08:00
Tong Da
a60a8704ba
Add search method to lightrag. Search is for retrieve structured objects (entities, relations, chunks) in their raw data format.
2025-09-01 02:19:58 +08:00
yangdx
5fd7682f16
Fix LLM output instability for <|> tuple delimiter
...
- Replace <||> with <|>
- Replace < | > with <|>
- Apply fix in both functions
- Handle delimiter variations
- Improve parsing reliability
2025-09-01 01:22:27 +08:00
yangdx
4e751e0653
refac: Enhance extraction with improved prompts and parser
...
- **Prompts**: Restructured prompts with clearer steps and quality guidelines. Simplified the relationship tuple by removing `relationship_strength`
- **Model**: Updated default entity types to be more comprehensive and consistently capitalized (e.g., `Location`, `Product`)
2025-08-31 22:24:11 +08:00
yangdx
75de40da41
Fix typo in relationship extraction log messages
2025-08-31 17:45:16 +08:00
yangdx
97c9600085
Improve extraction error handling and field validation
...
• Add field count validation warnings
• Fix relationship field count (5→6)
• Change error logs to warnings
2025-08-31 17:33:42 +08:00
yangdx
b747417961
feat: enhance text extraction text sanitization and normalization
...
- Improve reduntant quotes in entity and relation name, type and keywords
- Add HTML tag cleaning and Chinese symbol conversion
- Filter out short numeric content and malformed text
- Enhance entity type validation with character filtering
2025-08-31 13:17:20 +08:00
yangdx
d4bbc5dea9
refactor: Merge multi-step text sanitization into single function
2025-08-31 10:36:56 +08:00
yangdx
03d0fa3014
perf: add optional query_embedding parameter to avoid redundant embedding calls
2025-08-29 18:15:45 +08:00
yangdx
a923d378dd
Remove deprecated ID-based filtering from vector storage queries
...
- Remove ids param from QueryParam
- Simplify BaseVectorStorage.query signature
- Update all vector storage implementations
- Streamline PostgreSQL query templates
- Remove ID filtering from operate.py calls
2025-08-29 17:06:48 +08:00
yangdx
99e28e815b
fix: prevent document processing failures from UTF-8 surrogate characters
...
- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders
- Add strict UTF-8 cleaning pipeline to entity/relationship extraction
- Skip problematic entities/relationships instead of corrupting data
Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)
2025-08-27 23:52:39 +08:00
yangdx
6a2a592224
Fix linting
2025-08-27 12:51:50 +08:00
yangdx
28e07c89f9
Fix linting
2025-08-27 12:35:51 +08:00
yangdx
2ccc39de9a
Fix language fallback in summarize error
2025-08-27 12:34:27 +08:00
yangdx
ff0a18e08c
Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method
2025-08-27 12:23:22 +08:00
Thibo Rosemplatt
c3aabfc251
Merge branch 'main' into entityTypesServerSupport
2025-08-26 21:48:20 +02:00
yangdx
d3623cc9ae
fix: resolve infinite loop risk in _handle_entity_relation_summary
...
- Ensure oversized descriptions are force-merged with subsequent ones
- Add len(current_list) <= 2 termination condition to guarantee convergence
- Implement token-based truncation in _summarize_descriptions to prevent overflow
2025-08-26 21:58:31 +08:00
yangdx
79e0226b2b
Refactor: move force_llm_summary_on_merge to global_config access
...
- Remove parameter from function signature
- Access from global_config instead
- Improve code consistency
2025-08-26 18:02:39 +08:00
yangdx
6bcfe696ee
feat: add output length recommendation and description type to LLM summary
...
- Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens)
- Optimize prompt temple for LLM summary
2025-08-26 14:41:12 +08:00
yangdx
025f70089a
Simplify status messages in knowledge rebuild operations
2025-08-26 04:26:15 +08:00
yangdx
9eb2be79b8
feat: track actual LLM usage in entity/relation merging
...
- Modified _handle_entity_relation_summary to return tuple[str, bool]
- Updated merge functions to log "LLMmerg" vs "Merging" based on actual LLM usage
- Replaced hardcoded fragment count prediction with real-time LLM usage tracking
2025-08-26 03:56:18 +08:00
yangdx
de2daf6565
refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration
...
- Update algorithm logic in operate.py for better token management
- Fix health endpoint to use correct parameter names
2025-08-26 01:35:50 +08:00
yangdx
91767ffcee
Improve warning message formatting in entity/relationship rebuild
2025-08-25 21:55:29 +08:00
yangdx
15cdd0dd8f
fix: Sort cached extraction results by the create_time within each chunk
...
This ensures the KG rebuilds maintain the original creation order of the first extraction result for each chunk.
2025-08-25 21:41:33 +08:00
yangdx
882d6857d8
feat: Implement map-reduce summarization to handle large humber of description merging
2025-08-25 21:03:16 +08:00
yangdx
cac8e189e7
Remove redundant entity vector deletion before upsert
2025-08-25 17:18:51 +08:00
yangdx
9b6de7512d
Optimize the stability of description merging order
2025-08-25 17:10:51 +08:00
yangdx
31f4f96944
Exclude conversation history from context length calculation
2025-08-25 12:43:34 +08:00
yangdx
f688e95f56
Add warning for vector chunks missing chunk_id
2025-08-25 12:42:25 +08:00
yangdx
b6aedba7ae
Add logging for empty naive query results in vector context
2025-08-25 12:21:31 +08:00
yangdx
f1ff5cf93f
fix: initialize truncated_chunks variable in _build_query_context
...
Prevents local variable 'truncated_chunks'referenced before assignment
2025-08-25 11:56:56 +08:00
Thibo Rosemplatt
d054ec5d00
Added entity_types as a user defined variable (via .env)
2025-08-23 20:16:11 +02:00
yangdx
9bc349ddd6
Improve Empty Keyword Handling logic
2025-08-23 11:50:58 +08:00
yangdx
b5c230abdd
optimize: avoid duplicate embedding calls in _build_query_context
...
Reduces API costs and improves query performance while maintaining backward compatibility.
2025-08-21 16:49:24 +08:00
yangdx
2a7fec2873
Optimize keyword extraction prompt, and remove conversation history from keywork extraction.
...
- Remove history context processing
- Update prompt to focus on single query
- Clarify high/low level keyword types
- Improve JSON output instructions
- Add edge case handling guidance
2025-08-18 23:35:04 +08:00
yangdx
d3fde60938
refactor: remove file_path and created_at from context, improve token truncation
...
- Remove file_path and created_at fields from entity and relationship contexts
- Update token truncation to include full JSON serialization instead of content only
2025-08-18 18:30:09 +08:00
yangdx
1e2d5252d7
Add get_vectors_by_ids method and filter out vector data from query results
2025-08-15 16:32:26 +08:00
yangdx
6cab68bb47
Improve KG chunk selection documentation and configuration clarity
2025-08-15 10:09:44 +08:00