mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-29 17:01:30 +00:00
Gutenberg for Windows users (#99)
This commit is contained in:
parent
f30dd2dd2b
commit
5af3834760
@ -13,6 +13,8 @@ Please read the [Project Gutenberg Permissions, Licensing and other Common Reque
|
|||||||
|
|
||||||
### 1) Download the dataset
|
### 1) Download the dataset
|
||||||
|
|
||||||
|
In this section, we download books from Project Gutenberg using code from the [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) GitHub repository.
|
||||||
|
|
||||||
As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then.
|
As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then.
|
||||||
|
|
||||||
Follow these steps to download the dataset:
|
Follow these steps to download the dataset:
|
||||||
@ -28,6 +30,10 @@ Follow these steps to download the dataset:
|
|||||||
|
|
||||||
5. `cd ..`
|
5. `cd ..`
|
||||||
|
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`. Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" feature, which allows users to run a Linux environment in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/).
|
||||||
|
|
||||||
|
|
||||||
### 2) Prepare the dataset
|
### 2) Prepare the dataset
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user