LLMs-from-scratch/ch02/README.md

# Chapter 2: Working with Text Data

&nbsp;
## Main Chapter Code

- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions

&nbsp;
## Bonus Materials

- [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations

- [03_bonus_embedding-vs-matmul](03_bonus_embedding-vs-matmul) contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.

- [04_bonus_dataloader-intuition](04_bonus_dataloader-intuition) contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.

- [05_bpe-from-scratch](05_bpe-from-scratch) contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.


In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.

<br>
<br>

[![Link to the video](https://img.youtube.com/vi/341Rb8fJxY0/0.jpg)](https://www.youtube.com/watch?v=341Rb8fJxY0)
add readme files 2023-10-25 18:46:40 -05:00			`# Chapter 2: Working with Text Data`

Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00			` `
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00			`## Main Chapter Code`

add and update readme files 2024-02-05 06:51:58 -06:00			`- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions`
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00
Update bonus section formatting (#400) 2024-10-12 10:26:08 -05:00			` `
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00			`## Bonus Materials`

add readme files 2023-10-25 18:46:40 -05:00			`- [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations`
distinguish better between main chapter code and bonus materials 2024-06-11 21:07:42 -05:00
add readme files 2023-10-25 18:46:40 -05:00			`- [03_bonus_embedding-vs-matmul](03_bonus_embedding-vs-matmul) contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.`
updating REAMDE from chapter 02 inclund 04_bonus section (#344) * updating REAMDE from chapter 02 inclund 04_bonus section * Update ch02/README.md --------- Co-authored-by: Gustavo Monti Rocha <gustavo.rocha@intelliway.com.br> Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> 2024-09-05 03:09:46 -03:00
			`- [04_bonus_dataloader-intuition](04_bonus_dataloader-intuition) contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.`
Add BPE from scratch link (#550) 2025-02-28 09:57:41 -06:00
			`- [05_bpe-from-scratch](05_bpe-from-scratch) contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.`
add link to supplementary ch02 video (#553) 2025-03-02 13:17:42 -06:00




			`In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.`

			`<br>`
			`<br>`

fixed video link (#646) 2025-06-13 22:16:18 +09:00			`[![Link to the video](https://img.youtube.com/vi/341Rb8fJxY0/0.jpg)](https://www.youtube.com/watch?v=341Rb8fJxY0)`