Added PDF display support to Docker image and VS Code and updated first step for gutenberg project (#111)

* added VS Code extensions recommendations

* Added PDF display support to Docker image and VS Code

* fixed steps to download the dataset
This commit is contained in:
Daniel Kleine 2024-04-09 02:37:55 +02:00 committed by GitHub
parent 58d5bd9e39
commit 61b6e35ddf
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 4 additions and 2 deletions

View File

@ -11,7 +11,8 @@
"ms-python.python",
"ms-azuretools.vscode-docker",
"ms-toolsai.jupyter",
"yahyabatulu.vscode-markdown-alert"
"yahyabatulu.vscode-markdown-alert",
"tomoki1207.pdf"
]
}
}

View File

@ -5,5 +5,6 @@
"ms-azuretools.vscode-docker",
"ms-vscode-remote.vscode-remote-extensionpack",
"yahyabatulu.vscode-markdown-alert",
"tomoki1207.pdf",
]
}

View File

@ -23,7 +23,7 @@ As of this writing, this will require approximately 50 GB of disk space, but it
Linux and macOS users can follow these steps to download the dataset (if you are a Windows user, please see the note below):
Set the `03_bonus_pretraining_on_gutenberg` folder as working directory to clone the `gutenberg` repository locally in this folder (this is necessary to run the provided scripts `prepare_dataset.py` and `pretraining_simple.py`). For instance, when being in the `LLMs-from-scratch` repository's folder, navigate into the *03_bonus_pretraining_on_gutenberg* folder via:
1. Set the `03_bonus_pretraining_on_gutenberg` folder as working directory to clone the `gutenberg` repository locally in this folder (this is necessary to run the provided scripts `prepare_dataset.py` and `pretraining_simple.py`). For instance, when being in the `LLMs-from-scratch` repository's folder, navigate into the *03_bonus_pretraining_on_gutenberg* folder via:
```bash
cd ch05/03_bonus_pretraining_on_gutenberg
```