update notes

This commit is contained in:
rasbt 2024-04-02 18:27:13 -05:00
parent 0b47dfc381
commit 21140b98d4
2 changed files with 27 additions and 7 deletions

View File

@ -204,9 +204,7 @@
"id": "ff856c58-8382-44c7-827f-798040e6e697",
"metadata": {},
"source": [
"- By convention, the unnormalized attention weights are referred to as **\"attention scores\"** whereas the normalized attention scores, which sum to 1, are referred to as **\"attention weights\"**\n",
"\n",
"- The attention weights and context vector calculation are summarized in the figure below:"
"- By convention, the unnormalized attention weights are referred to as **\"attention scores\"** whereas the normalized attention scores, which sum to 1, are referred to as **\"attention weights\"**\n"
]
},
{
@ -2002,7 +2000,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.6"
}
},
"nbformat": 4,

View File

@ -7,7 +7,7 @@ As the Project Gutenberg website states, "the vast majority of Project Gutenberg
Please read the [Project Gutenberg Permissions, Licensing and other Common Requests](https://www.gutenberg.org/policy/permission.html) page for more information about using the resources provided by Project Gutenberg.
 
## How to use this code
## How to Use This Code
 
@ -17,7 +17,11 @@ In this section, we download books from Project Gutenberg using code from the [`
As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then.
Follow these steps to download the dataset:
 
#### Download instructions for Linux and macOS users
Linux and macOS users can follow these steps to download the dataset (if you are a Windows user, please see the note below):
1. `git clone https://github.com/pgcorpus/gutenberg.git`
@ -31,8 +35,26 @@ Follow these steps to download the dataset:
5. `cd ..`
 
#### Special instructions for Windows users
The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`.
Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" (WSL) feature, which allows users to run a Linux environment using Ubuntu in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/).
When using WSL, please make sure you have Python 3 installed (check via `python3 --version`, or install it for instance with `sudo apt-get install -y python3.10` for Python 3.10) and install following packages there:
```bash
sudo apt-get update && \
sudo apt-get upgrade -y && \
sudo apt-get install -y python3-pip && \
sudo apt-get install -y python-is-python3 && \
sudo apt-get install -y rsync && \
```
> [!NOTE]
> The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`. Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" feature, which allows users to run a Linux environment in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/).
> Instructions about how to set up Python and installing packages can be found in [Appendix A: Optional Python Setup Preferences](../../appendix-A/01_optional-python-setup-preferences/README.md) and [Appendix A: Installing Python Libraries](../../appendix-A/02_installing-python-libraries/README.md).
>
> Optionally, a Docker image running Ubuntu is provided with this repository. When having cloned the [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) GitHub repository, copy the *.devcontainer* folder of this `LLMs-from-scratch` repository and paste it to the locally cloned `gutenberg` repository. Instructions about how to run a container with the provided Docker image can be found in [Appendix A: Optional Docker Environment](../../appendix-A/04_optional-docker-environment/README.md).
 
### 2) Prepare the dataset