mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-09-02 12:57:41 +00:00
29 lines
1.1 KiB
Markdown
29 lines
1.1 KiB
Markdown
# Chapter 2: Working with Text Data
|
|
|
|
|
|
## Main Chapter Code
|
|
|
|
- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions
|
|
|
|
|
|
## Bonus Materials
|
|
|
|
- [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations
|
|
|
|
- [03_bonus_embedding-vs-matmul](03_bonus_embedding-vs-matmul) contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.
|
|
|
|
- [04_bonus_dataloader-intuition](04_bonus_dataloader-intuition) contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.
|
|
|
|
- [05_bpe-from-scratch](05_bpe-from-scratch) contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.
|
|
|
|
|
|
|
|
|
|
|
|
In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.
|
|
|
|
<br>
|
|
<br>
|
|
|
|
[](https://www.youtube.com/watch?v=yAcWnfsZhzo)
|