4 Commits

Author SHA1 Message Date
Sebastian Raschka
a22d612be6
Bonus material: extending tokenizers (#496)
* Bonus material: extending tokenizers

* small wording update
2025-01-22 09:26:54 -06:00
Austin Welch
0f35e370ed
fix: preserve newline tokens in BPE encoder (#495)
* fix: preserve newline tokens in BPE encoder

* further fixes

* more fixes

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-01-21 12:47:15 -06:00
Daniel Kleine
60acb94894
BPE: fixed typo (#492)
* fixed typo

* use rel path if exists

* mod gitignore and use existing vocab files

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-01-20 20:49:53 -06:00
Sebastian Raschka
0d4967eda6
Implementingthe BPE Tokenizer from Scratch (#487) 2025-01-17 12:22:00 -06:00