Logo
Explore Help
Register Sign In
yujunjun/haystack
1
0
Fork 0
You've already forked haystack
mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-06 03:47:22 +00:00
Code Issues Packages Projects Releases Wiki Activity
haystack/releasenotes/notes/split-by-token-b9a4f954d4077ecc.yaml

3 lines
58 B
YAML
Raw Normal View History

feat: PreProcessor split by token (tiktoken & Hugging Face) (#5276) * #4983 implemented split by token for tiktoken tokenizer * #4983 added unit test for tiktoken splitting * #4983 implemented and added a test for splitting documents with HuggingFace tokenizer * #4983 added support for passing HF model names (instead of objects) and added an example to the HF token splitting test * mocked HTTP model loading in unit tests, fixed pylint error * fix lossy tokenizers splitting, use LazyImport, ignore UnicodeEncodeError for tiktoken * reno * rename reno file --------- Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com> Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 03:26:37 -08:00
features:
- Add `split_length` by token in PreProcessor
Reference in New Issue Copy Permalink
Powered by Gitea Version: 1.23.5 Page: 1676ms Template: 157ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API