Organized setup instructions (#115)
* Organized setup instructions * update tets * link checker action * raise error upon broken link * fix links * fix links * delete duplicated paragraph
2
.github/workflows/basic-tests.yml
vendored
@ -34,7 +34,7 @@ jobs:
|
||||
run: |
|
||||
pytest ch04/01_main-chapter-code/tests.py
|
||||
pytest ch05/01_main-chapter-code/tests.py
|
||||
pytest appendix-A/02_installing-python-libraries/tests.py
|
||||
pytest setup/02_installing-python-libraries/tests.py
|
||||
|
||||
- name: Validate Selected Jupyter Notebooks
|
||||
run: |
|
||||
|
||||
24
.github/workflows/check-links.yml
vendored
Normal file
@ -0,0 +1,24 @@
|
||||
name: Check Markdown Links
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- main
|
||||
pull_request:
|
||||
branches:
|
||||
- main
|
||||
|
||||
jobs:
|
||||
check-links:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout Repository
|
||||
uses: actions/checkout@v3
|
||||
|
||||
- name: Install Markdown Link Checker
|
||||
run: npm install -g markdown-link-check
|
||||
|
||||
- name: Find Markdown Files and Check Links
|
||||
run: |
|
||||
find . -name '*.md' -exec markdown-link-check {} \;
|
||||
29
README.md
@ -24,7 +24,7 @@ The method described in this book for training and developing your own small-but
|
||||
|
||||
# Table of Contents
|
||||
|
||||
Please note that the `Readme.md` file is a Markdown (`.md`) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, [MarkText](https://www.marktext.cc) is a good free option.
|
||||
Please note that this `README.md` file is a Markdown (`.md`) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, [MarkText](https://www.marktext.cc) is a good free option.
|
||||
|
||||
Alternatively, you can view this and other files on GitHub at [https://github.com/rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch).
|
||||
|
||||
@ -36,17 +36,23 @@ Alternatively, you can view this and other files on GitHub at [https://github.co
|
||||
|
||||
<br>
|
||||
|
||||
|
||||
> [!TIP]
|
||||
> If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the [README.md](setup/README.md) file located in the [setup](setup) directory.
|
||||
|
||||
<br>
|
||||
|
||||
| Chapter Title | Main Code (for quick access) | All Code + Supplementary |
|
||||
|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
|
||||
| Ch 1: Understanding Large Language Models | No code | - |
|
||||
| Ch 2: Working with Text Data | - [ch02.ipynb](ch02/01_main-chapter-code/ch02.ipynb)<br/>- [dataloader.ipynb](ch02/01_main-chapter-code/dataloader.ipynb) (summary)<br/>- [exercise-solutions.ipynb](ch02/01_main-chapter-code/exercise-solutions.ipynb) | [./ch02](./ch02) |
|
||||
| Ch 3: Coding Attention Mechanisms | - [ch03.ipynb](ch03/01_main-chapter-code/ch03.ipynb)<br/>- [multihead-attention.ipynb](ch03/01_main-chapter-code/multihead-attention.ipynb) (summary) <br/>- [exercise-solutions.ipynb](ch03/01_main-chapter-code/exercise-solutions.ipynb)| [./ch03](./ch03) |
|
||||
| Ch 4: Implementing a GPT Model from Scratch | - [ch04.ipynb](ch04/01_main-chapter-code/ch04.ipynb)<br/>- [gpt.py](ch04/01_main-chapter-code/gpt.py) (summary)<br/>- [exercise-solutions.ipynb](ch04/01_main-chapter-code/exercise-solutions.ipynb) | [./ch04](./ch04) |
|
||||
| Ch 5: Pretraining on Unlabeled Data | - [ch05.ipynb](ch05/01_main-chapter-code/ch05.ipynb)<br/>- [train.py](ch05/01_main-chapter-code/train.py) (summary) <br/>- [generate.py](ch05/01_main-chapter-code/generate.py) (summary) <br/>- [exercise-solutions.ipynb](ch05/01_main-chapter-code/exercise-solutions.ipynb) | [./ch05](./ch05) |
|
||||
| Ch 5: Pretraining on Unlabeled Data | - [ch05.ipynb](ch05/01_main-chapter-code/ch05.ipynb)<br/>- [gpt_train.py](ch05/01_main-chapter-code/gpt_train.py) (summary) <br/>- [gpt_generate.py](ch05/01_main-chapter-code/gpt_generate.py) (summary) <br/>- [exercise-solutions.ipynb](ch05/01_main-chapter-code/exercise-solutions.ipynb) | [./ch05](./ch05) |
|
||||
| Ch 6: Finetuning for Text Classification | Q2 2024 | ... |
|
||||
| Ch 7: Finetuning with Human Feedback | Q2 2024 | ... |
|
||||
| Ch 8: Using Large Language Models in Practice | Q2/3 2024 | ... |
|
||||
| Appendix A: Introduction to PyTorch | - [code-part1.ipynb](appendix-A/03_main-chapter-code/code-part1.ipynb)<br/>- [code-part2.ipynb](appendix-A/03_main-chapter-code/code-part2.ipynb)<br/>- [DDP-script.py](appendix-A/03_main-chapter-code/DDP-script.py)<br/>- [exercise-solutions.ipynb](appendix-A/03_main-chapter-code/exercise-solutions.ipynb) | [./appendix-A](./appendix-A) |
|
||||
| Appendix A: Introduction to PyTorch | - [code-part1.ipynb](appendix-A/01_main-chapter-code/code-part1.ipynb)<br/>- [code-part2.ipynb](appendix-A/01_main-chapter-code/code-part2.ipynb)<br/>- [DDP-script.py](appendix-A/01_main-chapter-code/DDP-script.py)<br/>- [exercise-solutions.ipynb](appendix-A/01_main-chapter-code/exercise-solutions.ipynb) | [./appendix-A](./appendix-A) |
|
||||
| Appendix B: References and Further Reading | No code | - |
|
||||
| Appendix C: Exercises | No code | - |
|
||||
| Appendix D: Adding Bells and Whistles to the Training Loop | - [appendix-D.ipynb](appendix-D/01_main-chapter-code/appendix-D.ipynb) | [./appendix-D](./appendix-D) |
|
||||
@ -54,11 +60,6 @@ Alternatively, you can view this and other files on GitHub at [https://github.co
|
||||
|
||||
|
||||
|
||||
> [!TIP]
|
||||
> Please see [this](appendix-A/01_optional-python-setup-preferences) and [this](appendix-A/02_installing-python-libraries) folder if you need more guidance on installing Python and Python packages.
|
||||
|
||||
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
@ -74,10 +75,10 @@ Shown below is a mental model summarizing the contents covered in this book.
|
||||
|
||||
Several folders contain optional materials as a bonus for interested readers:
|
||||
|
||||
- **Appendix A:**
|
||||
- [Python Setup Tips](appendix-A/01_optional-python-setup-preferences)
|
||||
- [Installing Libraries Used In This Book](appendix-A/02_installing-python-libraries)
|
||||
- [Docker Environment Setup Guide](appendix-A/04_optional-docker-environment)
|
||||
- **Setup**
|
||||
- [Python Setup Tips](setup/01_optional-python-setup-preferences)
|
||||
- [Installing Libraries Used In This Book](setup/02_installing-python-libraries)
|
||||
- [Docker Environment Setup Guide](setup/03_optional-docker-environment)
|
||||
|
||||
- **Chapter 2:**
|
||||
- [Comparing Various Byte Pair Encoding (BPE) Implementations](ch02/02_bonus_bytepair-encoder)
|
||||
@ -88,9 +89,9 @@ Several folders contain optional materials as a bonus for interested readers:
|
||||
|
||||
- **Chapter 5:**
|
||||
- [Alternative Weight Loading from Hugging Face Model Hub using Transformers](ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb)
|
||||
- [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)
|
||||
- [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)
|
||||
- [Adding Bells and Whistles to the Training Loop](ch05/04_learning_rate_schedulers)
|
||||
- [Optimizing Hyperparameters for Pretraining](05_bonus_hparam_tuning)
|
||||
- [Optimizing Hyperparameters for Pretraining](ch05/05_bonus_hparam_tuning)
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
@ -1,3 +0,0 @@
|
||||
# Optional Docker Environment
|
||||
|
||||
This is an optional Docker environment for those users who prefer Docker. For more instructions, see the *Docker Environment Setup Guide* in [appendix-A/04_optional-docker-environment](../).
|
||||
@ -2,6 +2,6 @@
|
||||
|
||||
- [ch05.ipynb](ch05.ipynb) contains all the code as it appears in the chapter
|
||||
- [previous_chapters.py](previous_chapters.py) is a Python module that contains the `MultiHeadAttention` module from the previous chapter, which we import in [ch05.ipynb](ch05.ipynb) to pretrain the GPT model
|
||||
- [train.py](train.py) is a standalone Python script file with the code that we implemented in [ch05.ipynb](ch05.ipynb) to train the GPT model
|
||||
- [generate.py](generate.py) is a standalone Python script file with the code that we implemented in [ch05.ipynb](ch05.ipynb) to load and use the pretrained model weights from OpenAI
|
||||
- [gpt_train.py](gpt_train.py) is a standalone Python script file with the code that we implemented in [ch05.ipynb](ch05.ipynb) to train the GPT model
|
||||
- [gpt_generate.py](gpt_generate.py) is a standalone Python script file with the code that we implemented in [ch05.ipynb](ch05.ipynb) to load and use the pretrained model weights from OpenAI
|
||||
|
||||
|
||||
@ -1383,7 +1383,7 @@
|
||||
"id": "de713235-1561-467f-bf63-bf11ade383f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**If you are interested in augmenting this training function with more advanced techniques, such as learning rate warmup, cosine annealing, and gradient clipping, please refer to [Appendix D](../../appendix-D/03_main-chapter-code)**"
|
||||
"**If you are interested in augmenting this training function with more advanced techniques, such as learning rate warmup, cosine annealing, and gradient clipping, please refer to [Appendix D](../../appendix-D/01_main-chapter-code)**"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2438,7 +2438,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.12.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@ -42,7 +42,7 @@ cd gutenberg
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
|
||||
5. Download the data:
|
||||
```bash
|
||||
python get_data.py
|
||||
@ -71,9 +71,9 @@ sudo apt-get install -y rsync && \
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Instructions about how to set up Python and installing packages can be found in [Appendix A: Optional Python Setup Preferences](../../appendix-A/01_optional-python-setup-preferences/README.md) and [Appendix A: Installing Python Libraries](../../appendix-A/02_installing-python-libraries/README.md).
|
||||
> Instructions about how to set up Python and installing packages can be found in [Optional Python Setup Preferences](../../setup/01_optional-python-setup-preferences/README.md) and [Installing Python Libraries](../../setup/02_installing-python-libraries/README.md).
|
||||
>
|
||||
> Optionally, a Docker image running Ubuntu is provided with this repository. Instructions about how to run a container with the provided Docker image can be found in [Appendix A: Optional Docker Environment](../../appendix-A/04_optional-docker-environment/README.md).
|
||||
> Optionally, a Docker image running Ubuntu is provided with this repository. Instructions about how to run a container with the provided Docker image can be found in [Optional Docker Environment](../../setup/03_optional-docker-environment/README.md).
|
||||
|
||||
|
||||
### 2) Prepare the dataset
|
||||
@ -161,7 +161,7 @@ Note that this code focuses on keeping things simple and minimal for educational
|
||||
3. Update the `train_model_simple` script by adding the features introduced in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb), namely, cosine decay, linear warmup, and gradient clipping.
|
||||
4. Update the pretraining script to save the optimizer state (see section *5.4 Loading and saving weights in PyTorch* in chapter 5; [ch05.ipynb](../../ch05/01_main-chapter-code/ch05.ipynb)) and add the option to load an existing model and optimizer checkpoint and continue training if the training run was interrupted.
|
||||
5. Add a more advanced logger (for example, Weights and Biases) to view the loss and validation curves live
|
||||
6. Add distributed data parallelism (DDP) and train the model on multiple GPUs (see section *A.9.3 Training with multiple GPUs* in appendix A; [DDP-script.py](../../appendix-A/03_main-chapter-code/DDP-script.py)).
|
||||
6. Add distributed data parallelism (DDP) and train the model on multiple GPUs (see section *A.9.3 Training with multiple GPUs* in appendix A; [DDP-script.py](../../appendix-A/01_main-chapter-code/DDP-script.py)).
|
||||
7. Swap the from scratch `MultiheadAttention` class in the `previous_chapter.py` script with the efficient `MHAPyTorchScaledDotProduct` class implemented in the [Efficient Multi-Head Attention Implementations](../../ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb) bonus section, which uses Flash Attention via PyTorch's `nn.functional.scaled_dot_product_attention` function.
|
||||
8. Speeding up the training by optimizing the model via [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) (`model = torch.compile`) or [thunder](https://github.com/Lightning-AI/lightning-thunder) (`model = thunder.jit(model)`).
|
||||
9. Implement Gradient Low-Rank Projection (GaLore) to further speed up the pretraining process. This can be achieved by just replacing the `AdamW` optimizer with the provided `GaLoreAdamW` provided in the [GaLore Python library](https://github.com/jiaweizzhao/GaLore).
|
||||
@ -1,10 +1,6 @@
|
||||
# Optimizing Hyperparameters for Pretraining
|
||||
|
||||
The [hparam_search.py](hparam_search.py) is script based on the extended training function in [
|
||||
Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb) to find optimal hyperparameters via grid search
|
||||
|
||||
The [hparam_search.py](hparam_search.py) script, based on the extended training function in [
|
||||
Appendix D: Adding Bells and Whistles to the Training Loop](../appendix-D/01_main-chapter-code/appendix-D.ipynb), is designed to find optimal hyperparameters via grid search.
|
||||
The [hparam_search.py](hparam_search.py) script, based on the extended training function in [Appendix D: Adding Bells and Whistles to the Training Loop](../../appendix-D/01_main-chapter-code/appendix-D.ipynb), is designed to find optimal hyperparameters via grid search.
|
||||
|
||||
>[!NOTE]
|
||||
This script will take a long time to run. You may want to reduce the number of hyperparameter configurations explored in the `HPARAM_GRID` dictionary at the top.
|
||||
@ -4,4 +4,4 @@
|
||||
- [02_alternative_weight_loading](02_alternative_weight_loading) contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
|
||||
- [03_bonus_pretraining_on_gutenberg](03_bonus_pretraining_on_gutenberg) contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
|
||||
- [04_learning_rate_schedulers] contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
|
||||
- [05_hparam_tuning](05_hparam_tuning) contains an optional hyperparameter tuning script
|
||||
- [05_bonus_hparam_tuning](05_bonus_hparam_tuning) contains an optional hyperparameter tuning script
|
||||
|
Before Width: | Height: | Size: 180 KiB After Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 220 KiB After Width: | Height: | Size: 220 KiB |
|
Before Width: | Height: | Size: 186 KiB After Width: | Height: | Size: 186 KiB |
|
Before Width: | Height: | Size: 174 KiB After Width: | Height: | Size: 174 KiB |
|
Before Width: | Height: | Size: 258 KiB After Width: | Height: | Size: 258 KiB |
|
Before Width: | Height: | Size: 185 KiB After Width: | Height: | Size: 185 KiB |
|
Before Width: | Height: | Size: 94 KiB After Width: | Height: | Size: 94 KiB |
|
Before Width: | Height: | Size: 107 KiB After Width: | Height: | Size: 107 KiB |
|
Before Width: | Height: | Size: 79 KiB After Width: | Height: | Size: 79 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 103 KiB |
|
Before Width: | Height: | Size: 94 KiB After Width: | Height: | Size: 94 KiB |
|
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
@ -0,0 +1,3 @@
|
||||
# Optional Docker Environment
|
||||
|
||||
This is an optional Docker environment for those users who prefer Docker. In case you are interested in using this Docker DevContainer, please see the *Using Docker DevContainers* section in the [../../README.md](../../README.md) for more information.
|
||||
@ -27,10 +27,8 @@ git clone https://github.com/rasbt/LLMs-from-scratch.git
|
||||
cd LLMs-from-scratch
|
||||
```
|
||||
|
||||
2. Move the `.devcontainer` file to the main `LLMs-from-scratch` project directory.
|
||||
|
||||
```bash
|
||||
mv appendix-A/04_optional-docker-environment/.devcontainer ./
|
||||
mv 1st_setup/03_optional-docker-environment/.devcontainer ./
|
||||
```
|
||||
|
||||
3. In Docker Desktop, make sure that ***desktop-linux* builder** is running and will be used to build the Docker container (see *Docker Desktop* -> *Change settings* -> *Builders* -> *desktop-linux* -> *...* -> *Use*)
|
||||
@ -19,11 +19,30 @@ pip install -r requirements.txt
|
||||
|
||||
If you don't have Python set up on your machine yet, I have written about my personal Python setup preferences in the following directories:
|
||||
|
||||
- [../appendix-A/01_optional-python-setup-preferences](../appendix-A/01_optional-python-setup-preferences)
|
||||
- [../02_installing-python-libraries](../appendix-A/02_installing-python-libraries)
|
||||
- [01_optional-python-setup-preferences](./01_optional-python-setup-preferences)
|
||||
- [02_installing-python-libraries](./02_installing-python-libraries)
|
||||
|
||||
The *Using DevContainers* section below outlines an alternative approach for installing project dependencies on your machine.
|
||||
|
||||
|
||||
|
||||
## Using Docker DevContainers
|
||||
|
||||
As an alternative to the *Setting up Python* section above, if you prefer a development setup that isolates a project's dependencies and configurations, using Docker is a highly effective solution. This approach eliminates the need to manually install software packages and libraries and ensures a consistent development environment. You can find more instructions for setting up Docker and using a DevContainer:
|
||||
|
||||
- [03_optional-docker-environment](03_optional-docker-environment)
|
||||
|
||||
|
||||
|
||||
## Visual Studio Code Editor
|
||||
|
||||
There are many good options for code editors. My preferred choice is the popular open-source [Visual Studio Code (VSCode)](https://code.visualstudio.com) editor, which can be easily enhanced with many useful plugins and extensions (see the *VSCode Extensions* section below for more information). Download instructions for macOS, Linux, and Windows can be found on the [main VSCode website](https://code.visualstudio.com).
|
||||
|
||||
|
||||
|
||||
## VSCode Extensions
|
||||
|
||||
If you are using Visual Studio Code (VSCode) as your primary code editor, you can find recommended extensions in the `.vscode` subfolder. To install these, open the `extensions.json` file in VSCode and click the "Install" button in the pop-up menu on the lower right.
|
||||
|
||||
|
||||
|
||||
@ -44,18 +63,6 @@ You can optionally run the code on a GPU by changing the *Runtime* as illustrate
|
||||
<img src="./figures/3.webp" alt="3" width="700">
|
||||
|
||||
|
||||
|
||||
|
||||
## Using DevContainers
|
||||
|
||||
Alternatively, If you prefer a development setup that isolates a project's dependencies and configurations, using Docker is a highly effective solution. This approach eliminates the need to manually install software packages and libraries and ensures a consistent development environment. You can find more instructions for setting up Docker and using a DevContainer here in [../appendix-A/04_optional-docker-environment](../appendix-A/04_optional-docker-environment).
|
||||
|
||||
|
||||
|
||||
## VSCode extensions
|
||||
|
||||
If you are using Visual Studio Code (VSCode) as your primary code editor, you can find recommended extensions in the `.vscode` subfolder. To install these, open the `extensions.json` file in VSCode and click the "Install" button in the pop-up menu on the lower right.
|
||||
|
||||
|
||||
|
||||
## Questions?
|
||||
|
Before Width: | Height: | Size: 35 KiB After Width: | Height: | Size: 35 KiB |
|
Before Width: | Height: | Size: 62 KiB After Width: | Height: | Size: 62 KiB |
|
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 30 KiB |