From e827b42e1e82fee28fd25f37765680664e35cd76 Mon Sep 17 00:00:00 2001 From: rasbt Date: Wed, 25 Oct 2023 18:46:40 -0500 Subject: [PATCH] add readme files --- ch01/README.md | 3 ++- ch02/01_main-chapter-code/README.md | 5 +++++ ch02/02_bonus_bytepair-encoder/README.md | 7 +++++++ ch02/03_bonus_embedding-vs-matmul/README.md | 3 +++ ch02/README.md | 7 +++++++ 5 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 ch02/01_main-chapter-code/README.md create mode 100644 ch02/02_bonus_bytepair-encoder/README.md create mode 100644 ch02/03_bonus_embedding-vs-matmul/README.md create mode 100644 ch02/README.md diff --git a/ch01/README.md b/ch01/README.md index 1e7ac0c..50002e6 100644 --- a/ch01/README.md +++ b/ch01/README.md @@ -1,2 +1,3 @@ -Details will follow ... +# Chapter 1: Understanding Large Language Models +There is no code in this chapter. diff --git a/ch02/01_main-chapter-code/README.md b/ch02/01_main-chapter-code/README.md new file mode 100644 index 0000000..646bf68 --- /dev/null +++ b/ch02/01_main-chapter-code/README.md @@ -0,0 +1,5 @@ +# Chapter 2: Working with Text Data + +- [ch02.ipynb](ch02.ipynb) has all the code as it appears in the chapter +- [dataloader.ipynb](dataloader.ipynb) is a minimal notebook with the main data loading pipeline implemented in this chapter + diff --git a/ch02/02_bonus_bytepair-encoder/README.md b/ch02/02_bonus_bytepair-encoder/README.md new file mode 100644 index 0000000..30a1946 --- /dev/null +++ b/ch02/02_bonus_bytepair-encoder/README.md @@ -0,0 +1,7 @@ +# Chapter 2: Working with Text Data + + + +- [compare-bpe-tiktoken.ipynb](compare-bpe-tiktoken.ipynb) benchmarks various byte pair encoding implementations +- [bpe_openai_gpt2.py](bpe_openai_gpt2.py) is the original bytepair encoder code used by OpenAI + diff --git a/ch02/03_bonus_embedding-vs-matmul/README.md b/ch02/03_bonus_embedding-vs-matmul/README.md new file mode 100644 index 0000000..a1f67ef --- /dev/null +++ b/ch02/03_bonus_embedding-vs-matmul/README.md @@ -0,0 +1,3 @@ +# Chapter 2: Working with Text Data + +- [embeddings-and-linear-layers.ipynb](embeddings-and-linear-layers.ipynb) contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent. diff --git a/ch02/README.md b/ch02/README.md new file mode 100644 index 0000000..7c085a9 --- /dev/null +++ b/ch02/README.md @@ -0,0 +1,7 @@ +# Chapter 2: Working with Text Data + +- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code + +- [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations + +- [03_bonus_embedding-vs-matmul](03_bonus_embedding-vs-matmul) contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.