mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-12-11 23:21:32 +00:00
Updates the characters to split when creating candidate english words. Now uses regex to parse out non-alphabetic characters for each word Note: This was originally an attempt to speedup contains_english_word() but there is no measurable change in performance.