mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-10-30 17:29:59 +00:00 
			
		
		
		
	Gutenberg for Windows users (#99)
This commit is contained in:
		
							parent
							
								
									f30dd2dd2b
								
							
						
					
					
						commit
						5af3834760
					
				| @ -13,6 +13,8 @@ Please read the [Project Gutenberg Permissions, Licensing and other Common Reque | |||||||
| 
 | 
 | ||||||
| ### 1) Download the dataset | ### 1) Download the dataset | ||||||
| 
 | 
 | ||||||
|  | In this section, we download books from Project Gutenberg using code from the [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) GitHub repository. | ||||||
|  | 
 | ||||||
| As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then. | As of this writing, this will require approximately 50 GB of disk space, but it may be more depending on how much Project Gutenberg grew since then. | ||||||
| 
 | 
 | ||||||
| Follow these steps to download the dataset: | Follow these steps to download the dataset: | ||||||
| @ -28,6 +30,10 @@ Follow these steps to download the dataset: | |||||||
| 
 | 
 | ||||||
| 5. `cd ..` | 5. `cd ..` | ||||||
| 
 | 
 | ||||||
|  |   | ||||||
|  | > [!NOTE] | ||||||
|  | > The [`pgcorpus/gutenberg`](https://github.com/pgcorpus/gutenberg) code is compatible with both Linux and macOS. However, Windows users would have to make small adjustments, such as adding `shell=True` to the `subprocess` calls and replacing `rsync`. Alternatively, an easier way to run this code on Windows is by using the "Windows Subsystem for Linux" feature, which allows users to run a Linux environment in Windows. For more information, please read [Microsoft's official installation instruction](https://learn.microsoft.com/en-us/windows/wsl/install) and [tutorial](https://learn.microsoft.com/en-us/training/modules/wsl-introduction/). | ||||||
|  | 
 | ||||||
|   |   | ||||||
| ### 2) Prepare the dataset | ### 2) Prepare the dataset | ||||||
| 
 | 
 | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 Sebastian Raschka
						Sebastian Raschka