mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-11-03 19:30:26 +00:00 
			
		
		
		
	* Llama3 from scratch improvements * Cosmetic BPE improvements * restore * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * endoftext whitespace
Chapter 2: Working with Text Data
Main Chapter Code
- 01_main-chapter-code contains the main chapter code and exercise solutions
 
Bonus Materials
- 
02_bonus_bytepair-encoder contains optional code to benchmark different byte pair encoder implementations
 - 
03_bonus_embedding-vs-matmul contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.
 - 
04_bonus_dataloader-intuition contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.
 - 
05_bpe-from-scratch contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.
 
In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.
