yangdx 99e28e815b fix: prevent document processing failures from UTF-8 surrogate characters
- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders
- Add strict UTF-8 cleaning pipeline to entity/relationship extraction
- Skip problematic entities/relationships instead of corrupting data

Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)
2025-08-27 23:52:39 +08:00
..
2025-08-23 02:39:12 +08:00
2025-08-27 12:51:18 +08:00
2025-06-29 01:28:39 +05:00