29 lines
1.1 KiB
Markdown
Raw Normal View History

2023-10-25 18:46:40 -05:00
# Chapter 2: Working with Text Data
2024-10-12 10:26:08 -05:00
 
## Main Chapter Code
2024-02-05 06:51:58 -06:00
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions
2024-10-12 10:26:08 -05:00
 
## Bonus Materials
2023-10-25 18:46:40 -05:00
- [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations
2023-10-25 18:46:40 -05:00
- [03_bonus_embedding-vs-matmul](03_bonus_embedding-vs-matmul) contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.
- [04_bonus_dataloader-intuition](04_bonus_dataloader-intuition) contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.
2025-02-28 09:57:41 -06:00
- [05_bpe-from-scratch](05_bpe-from-scratch) contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.
In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.
<br>
<br>
2025-06-13 22:16:18 +09:00
[![Link to the video](https://img.youtube.com/vi/341Rb8fJxY0/0.jpg)](https://www.youtube.com/watch?v=341Rb8fJxY0)