From 6a9bedc2ec62f248fa6d3b959b726d2f2a81d56b Mon Sep 17 00:00:00 2001 From: Sebastian Raschka Date: Sat, 12 Oct 2024 10:26:08 -0500 Subject: [PATCH] Update bonus section formatting (#400) --- ch01/README.md | 9 ++++++++- ch02/README.md | 3 ++- ch03/README.md | 2 ++ ch04/README.md | 7 +++++-- ch05/README.md | 2 ++ ch06/README.md | 3 ++- ch07/README.md | 2 ++ 7 files changed, 23 insertions(+), 5 deletions(-) diff --git a/ch01/README.md b/ch01/README.md index f938fcc..b7d064c 100644 --- a/ch01/README.md +++ b/ch01/README.md @@ -1,8 +1,15 @@ # Chapter 1: Understanding Large Language Models + +  +## Main Chapter Code + There is no code in this chapter. -
+ +  +## Bonus Materials + As optional bonus material, below is a video tutorial where I explain the LLM development lifecycle covered in this book:
diff --git a/ch02/README.md b/ch02/README.md index b6f09d0..bb603ee 100644 --- a/ch02/README.md +++ b/ch02/README.md @@ -1,10 +1,11 @@ # Chapter 2: Working with Text Data - +  ## Main Chapter Code - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions +  ## Bonus Materials - [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations diff --git a/ch03/README.md b/ch03/README.md index 46a7fd9..ad89208 100644 --- a/ch03/README.md +++ b/ch03/README.md @@ -1,9 +1,11 @@ # Chapter 3: Coding Attention Mechanisms +  ## Main Chapter Code - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code. +  ## Bonus Materials - [02_bonus_efficient-multihead-attention](02_bonus_efficient-multihead-attention) implements and compares different implementation variants of multihead-attention diff --git a/ch04/README.md b/ch04/README.md index 5891b2d..ad229d2 100644 --- a/ch04/README.md +++ b/ch04/README.md @@ -1,10 +1,13 @@ # Chapter 4: Implementing a GPT Model from Scratch to Generate Text +  ## Main Chapter Code - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code. -## Optional Code +  +## Bonus Materials -- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter. +- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter +- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5) diff --git a/ch05/README.md b/ch05/README.md index defa30b..4718a50 100644 --- a/ch05/README.md +++ b/ch05/README.md @@ -1,9 +1,11 @@ # Chapter 5: Pretraining on Unlabeled Data +  ## Main Chapter Code - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code +  ## Bonus Materials - [02_alternative_weight_loading](02_alternative_weight_loading) contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI diff --git a/ch06/README.md b/ch06/README.md index ddc28bf..abcbb6e 100644 --- a/ch06/README.md +++ b/ch06/README.md @@ -1,10 +1,11 @@ # Chapter 6: Finetuning for Classification - +  ## Main Chapter Code - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code +  ## Bonus Materials - [02_bonus_additional-experiments](02_bonus_additional-experiments) includes additional experiments (e.g., training the last vs first token, extending the input length, etc.) diff --git a/ch07/README.md b/ch07/README.md index b081489..2a3883c 100644 --- a/ch07/README.md +++ b/ch07/README.md @@ -1,9 +1,11 @@ # Chapter 7: Finetuning to Follow Instructions +  ## Main Chapter Code - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions +  ## Bonus Materials - [02_dataset-utilities](02_dataset-utilities) contains utility code that can be used for preparing an instruction dataset