mirror of
https://github.com/HKUDS/LightRAG.git
synced 2025-11-22 13:06:10 +00:00
- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders - Add strict UTF-8 cleaning pipeline to entity/relationship extraction - Skip problematic entities/relationships instead of corrupting data Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)