GitHub markdown updates (#545)

* GitHub markdown updates

* Apply suggestions from code review

* Apply suggestions from code review
This commit is contained in:
Sebastian Raschka 2025-02-23 12:25:44 -06:00 committed by GitHub
parent 11801be0e9
commit fa5760a8de
8 changed files with 61 additions and 57 deletions

View File

@ -48,7 +48,7 @@ You can alternatively view this and other files on GitHub at [https://github.com
<br>
<!-- -->
> [!TIP]
> **Tip:**
> If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the [README.md](setup/README.md) file located in the [setup](setup) directory.
<br>

View File

@ -70,7 +70,7 @@ sudo apt-get install -y python-is-python3 && \
sudo apt-get install -y rsync
```
> [!NOTE]
> **Note:**
> Instructions about how to set up Python and installing packages can be found in [Optional Python Setup Preferences](../../setup/01_optional-python-setup-preferences/README.md) and [Installing Python Libraries](../../setup/02_installing-python-libraries/README.md).
>
> Optionally, a Docker image running Ubuntu is provided with this repository. Instructions about how to run a container with the provided Docker image can be found in [Optional Docker Environment](../../setup/03_optional-docker-environment/README.md).
@ -94,10 +94,10 @@ Skipping gutenberg/data/raw/PG29836_raw.txt as it does not contain primarily Eng
```
> [!TIP]
> **Tip:**
> Note that the produced files are stored in plaintext format and are not pre-tokenized for simplicity. However, you may want to update the codes to store the dataset in a pre-tokenized form to save computation time if you are planning to use the dataset more often or train for multiple epochs. See the *Design Decisions and Improvements* at the bottom of this page for more information.
> [!TIP]
> **Tip:**
> You can choose smaller file sizes, for example, 50 MB. This will result in more files but might be useful for quicker pretraining runs on a small number of files for testing purposes.
@ -145,7 +145,7 @@ The output will be formatted in the following way:
&nbsp;
> [!TIP]
> **Tip:**
> In practice, if you are using macOS or Linux, I recommend using the `tee` command to save the log outputs to a `log.txt` file in addition to printing them on the terminal:
```bash
@ -153,7 +153,7 @@ python -u pretraining_simple.py | tee log.txt
```
&nbsp;
> [!WARNING]
> **Warning:**
> Note that training on 1 of the ~500 Mb text files in the `gutenberg_preprocessed` folder will take approximately 4 hours on a V100 GPU.
> The folder contains 47 files and will take approximately 200 hours (more than 1 week) to complete. You may want to run it on a smaller number of files.

View File

@ -6,7 +6,8 @@ There are several ways to install Python and set up your computing environment.
<br>
> [!NOTE] If you are running any of the notebooks on Google Colab and want to install the dependencies, simply run the following code in a new cell at the top of the notebook and skip the rest of this tutorial:
> **Note:**
> If you are running any of the notebooks on Google Colab and want to install the dependencies, simply run the following code in a new cell at the top of the notebook and skip the rest of this tutorial:
> `pip install uv && uv pip install --system -r https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/refs/heads/main/requirements.txt`
The remaining sections below describe how you can manage your Python environment and packages on your local machine.
@ -24,7 +25,7 @@ In this tutorial, I am using a computer running macOS, but this workflow is simi
This section guides you through the Python setup and package installation procedure using `uv` via its `uv pip` interface. The `uv pip` interface may feel more familiar to most Python users who have used pip before than the native `uv` commands.
&nbsp;
> [!NOTE]
> **Note:**
> There are alternative ways to install Python and use `uv`. For example, you can install Python directly via `uv` and use `uv add` instead of `uv pip install` for even faster package management.
>
> If you are a macOS or Linux user and prefer the native `uv` commands, refer to the [./native-uv.md tutorial](./native-uv.md). I also recommend checking the official [`uv` documentation](https://docs.astral.sh/uv/).
@ -49,7 +50,11 @@ python --version
If it returns 3.10 or newer, no further action is required.
&nbsp;
> [!NOTE]
> **Note:**
> If `python --version` indicates that no Python version is installed, you may also want to check `python3 --version` since your system might be configured to use the `python3` command instead.
&nbsp;
> **Note:**
> I recommend installing a Python version that is at least 2 versions older than the most recent release to ensure PyTorch compatibility. For example, if the most recent version is Python 3.13, I recommend installing version 3.10 or 3.11.
Otherwise, if Python is not installed or is an older version, you can install it for your operating system as described below.
@ -118,7 +123,7 @@ source .venv/bin/activate
```
&nbsp;
> [!NOTE]
> **Note:**
> If you are using Windows, you may have to replace the command above by `source .venv/Scripts/activate` or `.venv/Scripts/activate`.
@ -157,7 +162,7 @@ uv pip install -r https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/refs
&nbsp;
> [!NOTE]
> **Note:**
> If you have problems with the following commands above due to certain dependencies (for example, if you are using Windows), you can always fall back to using regular pip:
> `pip install -r requirements.txt`
> or

View File

@ -37,7 +37,7 @@ wget -qO- https://pixi.sh/install.sh | sh
powershell -ExecutionPolicy ByPass -c "irm -useb https://pixi.sh/install.ps1 | iex"
```
> [!NOTE]
> **Note:**
> For more installation options, please refer to the official [pixi documentation](https://pixi.sh/latest/).
@ -50,7 +50,7 @@ You can install Python using pixi:
pixi add python=3.10
```
> [!NOTE]
> **Note:**
> I recommend installing a Python version that is at least 2 versions older than the most recent release to ensure PyTorch compatibility. For example, if the most recent version is Python 3.13, I recommend installing version 3.10 or 3.11. You can find out the most recent Python version by visiting [python.org](https://www.python.org).
&nbsp;
@ -62,7 +62,7 @@ To install all required packages from a `pixi.toml` file (such as the one locate
pixi install
```
> [!NOTE]
> **Note:**
> If you encounter issues with dependencies (for example, if you are using Windows), you can always fall back to pip: `pixi run pip install -U -r requirements.txt`
By default, `pixi install` will create a separate virtual environment specific to the project.

View File

@ -49,7 +49,7 @@ powershell -c "irm https://astral.sh/uv/install.ps1 | more"
&nbsp;
> [!NOTE]
> **Note:**
> For more installation options, please refer to the official [uv documentation](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer).
&nbsp;
@ -61,12 +61,11 @@ To install all required packages from a `pyproject.toml` file (such as the one l
uv sync --dev --python 3.11
```
> [!NOTE]
> **Note:**
> If you do not have Python 3.11 available on your system, uv will download and install it for you.
>
> I recommend using a Python version that is at least 1-3 versions older than the most recent release to ensure PyTorch compatibility. For example, if the most recent version is Python 3.13, I recommend using version 3.10, 3.11, 3.12. You can find out the most recent Python version by visiting [python.org](https://www.python.org/downloads/).
> [!NOTE]
> **Note:**
> If you have problems with the following commands above due to certain dependencies (for example, if you are using Windows), you can always fall back to regular pip:
> `uv add pip`
> `uv run python -m pip install -U -r requirements.txt`

View File

@ -6,7 +6,7 @@ I used the following libraries listed [here](https://github.com/rasbt/LLMs-from-
> [!NOTE]
> **Note:**
> If you you are using `uv` as described in [Option 1: Using uv](../01_optional-python-setup-preferences/README.md), you can replace `pip` via `pip uv` in the commands below. For example, `pip install -r requirements.txt` becomes `uv pip install -r requirements.txt`

View File

@ -86,7 +86,7 @@ The entire process is automated and might take a few minutes, depending on your
Once completed, VS Code will automatically connect to the container and reopen the project within the newly created Docker development environment. You will be able to write, execute, and debug code as if it were running on your local machine, but with the added benefits of Docker's isolation and consistency.
> [!WARNING]
> **Warning:**
> If you are encountering an error during the build process, this is likely because your machine does not support NVIDIA container toolkit because your machine doesn't have a compatible GPU. In this case, edit the `devcontainer.json` file to remove the `"runArgs": ["--runtime=nvidia", "--gpus=all"],` line and run the "Reopen Dev Container" procedure again.
9. Finished.

View File

@ -15,7 +15,7 @@ pip install -r requirements.txt
<br>
> [!NOTE] If you are running any of the notebooks on Google Colab and want to install the dependencies, simply run the following code in a new cell at the top of the notebook:
> **Note:** If you are running any of the notebooks on Google Colab and want to install the dependencies, simply run the following code in a new cell at the top of the notebook:
> `pip install uv && uv pip install --system -r https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/refs/heads/main/requirements.txt`