mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-29 17:01:30 +00:00
Gutenberg for Windows users (#99)
This commit is contained in:
parent
f30dd2dd2b
commit
5af3834760
@ -13,6 +13,8 @@ Please read the [Project Gutenberg Permissions, Licensing and other Common Reque
|
||||
|
||||
### 1) Download the dataset
|
||||
|
||||
In this section, we download books from Project Gutenberg using code from the [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) GitHub repository.
|
||||
|
||||
As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then.
|
||||
|
||||
Follow these steps to download the dataset:
|
||||
@ -28,6 +30,10 @@ Follow these steps to download the dataset:
|
||||
|
||||
5. `cd ..`
|
||||
|
||||
|
||||
> [!NOTE]
|
||||
> The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`. Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" feature, which allows users to run a Linux environment in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/).
|
||||
|
||||
|
||||
### 2) Prepare the dataset
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user