Sebastian Raschka
|
4ff743051e
|
BPE cosmetics (#629)
* Llama3 from scratch improvements
* Cosmetic BPE improvements
* restore
* Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb
* Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb
* endoftext whitespace
|
2025-04-18 18:57:09 -05:00 |
|
Sebastian Raschka
|
72efebd7f8
|
add special token handling to bpe from scratch code (#616)
|
2025-04-13 12:38:22 -05:00 |
|
Sebastian Raschka
|
2f41429cf4
|
Cosmetic improvements to the BPE code (#562)
|
2025-03-09 10:49:40 -05:00 |
|
Sebastian Raschka
|
f63f04d8d5
|
Fix BPE bonus materials (#561)
* Fix BPE bonus materials
* fix bpe implementation
* update
* Add 'Hello, world. Is this-- a test?' test case
* update link to test file
* update path handling
* update path handling
* fix pytest paths
|
2025-03-08 17:21:30 -06:00 |
|
Kasen
|
7bd36dccb4
|
Improve BPE vocabulary saving and pair frequency handling (#539)
|
2025-02-19 09:51:04 -06:00 |
|
Kasen
|
b47884ced0
|
Fix incorrect indentation (#536)
|
2025-02-18 14:47:31 -06:00 |
|
Austin Welch
|
0f35e370ed
|
fix: preserve newline tokens in BPE encoder (#495)
* fix: preserve newline tokens in BPE encoder
* further fixes
* more fixes
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
|
2025-01-21 12:47:15 -06:00 |
|
Daniel Kleine
|
60acb94894
|
BPE: fixed typo (#492)
* fixed typo
* use rel path if exists
* mod gitignore and use existing vocab files
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
|
2025-01-20 20:49:53 -06:00 |
|
Sebastian Raschka
|
0d4967eda6
|
Implementingthe BPE Tokenizer from Scratch (#487)
|
2025-01-17 12:22:00 -06:00 |
|