haystack/releasenotes/notes/add-unicode-normalization-and-ascii-mode-to-document-cleaner-ba536b46e499663c.yaml
Tim Wellbrock 2e2f5f17bb
feat: add unicode normalization & ascii_only mode for DocumentCleaner (#8103)
* feat: add unicode normalization & ascii_only mode for DocumentCleaner.

* feat: add unicode_normalization parameter valdiation to DocumentCleaner.

* test: fix the unit test to work after code linting.
2024-08-05 13:00:39 +02:00

7 lines
318 B
YAML

---
enhancements:
- |
Added `unicode_normalization` parameter to the DocumentCleaner, allowing to normalize the text to NFC, NFD, NFKC, or NFKD.
- |
Added `ascii_only` parameter to the DocumentCleaner, transforming letters with diacritics to their ASCII equivalent and removing other non-ASCII characters.