| 
							
							
								 Sebastian Raschka | 4ff743051e | BPE cosmetics (#629) * Llama3 from scratch improvements
* Cosmetic BPE improvements
* restore
* Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb
* Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb
* endoftext whitespace | 2025-04-18 18:57:09 -05:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | 72efebd7f8 | add special token handling to bpe from scratch code (#616) | 2025-04-13 12:38:22 -05:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | 2f41429cf4 | Cosmetic improvements to the BPE code (#562) | 2025-03-09 10:49:40 -05:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | f63f04d8d5 | Fix BPE bonus materials (#561) * Fix BPE bonus materials
* fix bpe implementation
* update
* Add 'Hello, world. Is this-- a test?' test case
* update link to test file
* update path handling
* update path handling
* fix pytest paths | 2025-03-08 17:21:30 -06:00 |  | 
			
				
					| 
							
							
								 Kasen | 7bd36dccb4 | Improve BPE vocabulary saving and pair frequency handling (#539) | 2025-02-19 09:51:04 -06:00 |  | 
			
				
					| 
							
							
								 Kasen | b47884ced0 | Fix incorrect indentation (#536) | 2025-02-18 14:47:31 -06:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | a22d612be6 | Bonus material: extending tokenizers (#496) * Bonus material: extending tokenizers
* small wording update | 2025-01-22 09:26:54 -06:00 |  | 
			
				
					| 
							
							
								 Austin Welch | 0f35e370ed | fix: preserve newline tokens in BPE encoder (#495) * fix: preserve newline tokens in BPE encoder
* further fixes
* more fixes
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> | 2025-01-21 12:47:15 -06:00 |  | 
			
				
					| 
							
							
								 Daniel Kleine | 60acb94894 | BPE: fixed typo (#492) * fixed typo
* use rel path if exists
* mod gitignore and use existing vocab files
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> | 2025-01-20 20:49:53 -06:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | 0d4967eda6 | Implementingthe  BPE Tokenizer from Scratch (#487) | 2025-01-17 12:22:00 -06:00 |  |