Kasen 
							
						 
					 
					
						
						
						
						
							
						
						
							af4b73ca7b 
							
						 
					 
					
						
						
							
							Improve BPE vocabulary saving and pair frequency handling ( #539 )  
						
						
						
						
					 
					
						2025-02-19 09:51:04 -06:00 
						 
				 
			
				
					
						
							
							
								Kasen 
							
						 
					 
					
						
						
						
						
							
						
						
							0a5214b804 
							
						 
					 
					
						
						
							
							Fix incorrect indentation ( #536 )  
						
						
						
						
					 
					
						2025-02-18 14:47:31 -06:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							d684ff418a 
							
						 
					 
					
						
						
							
							Fix typo in Ch02 comments ( #516 )  
						
						
						
						
					 
					
						2025-02-04 20:16:07 -06:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							dcaac28b92 
							
						 
					 
					
						
						
							
							Bonus material: extending tokenizers ( #496 )  
						
						... 
						
						
						
						* Bonus material: extending tokenizers
* small wording update 
						
						
					 
					
						2025-01-22 09:26:54 -06:00 
						 
				 
			
				
					
						
							
							
								Daniel Kleine 
							
						 
					 
					
						
						
						
						
							
						
						
							9175590ea4 
							
						 
					 
					
						
						
							
							add GPT2TokenizerFast to BPE comparison ( #498 )  
						
						... 
						
						
						
						* added HF BPE Fast
* update benchmarks
* add note about performance
* revert accidental changes
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> 
						
						
					 
					
						2025-01-22 09:26:44 -06:00 
						 
				 
			
				
					
						
							
							
								Austin Welch 
							
						 
					 
					
						
						
						
						
							
						
						
							654734053a 
							
						 
					 
					
						
						
							
							fix: preserve newline tokens in BPE encoder ( #495 )  
						
						... 
						
						
						
						* fix: preserve newline tokens in BPE encoder
* further fixes
* more fixes
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> 
						
						
					 
					
						2025-01-21 12:47:15 -06:00 
						 
				 
			
				
					
						
							
							
								Daniel Kleine 
							
						 
					 
					
						
						
						
						
							
						
						
							3f9facbc55 
							
						 
					 
					
						
						
							
							BPE: fixed typo ( #492 )  
						
						... 
						
						
						
						* fixed typo
* use rel path if exists
* mod gitignore and use existing vocab files
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> 
						
						
					 
					
						2025-01-20 20:49:53 -06:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							b17d097742 
							
						 
					 
					
						
						
							
							Implementingthe  BPE Tokenizer from Scratch ( #487 )  
						
						
						
						
					 
					
						2025-01-17 12:22:00 -06:00 
						 
				 
			
				
					
						
							
							
								Henry Shi 
							
						 
					 
					
						
						
						
						
							
						
						
							15af754304 
							
						 
					 
					
						
						
							
							Print out embeddings for more illustrative learning ( #481 )  
						
						... 
						
						
						
						* print out embeddings for illustrative learning
* suggestion print embeddingcontents
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> 
						
						
					 
					
						2025-01-13 14:44:06 -06:00 
						 
				 
			
				
					
						
							
							
								Tao Qian 
							
						 
					 
					
						
						
						
						
							
						
						
							65ee619d3b 
							
						 
					 
					
						
						
							
							Minor readability improvement in dataloader.ipynb ( #461 )  
						
						... 
						
						
						
						* Minor readability improvement in dataloader.ipynb
- The tokenizer and encoded_text variables at the root level are unused.
- The default params for create_dataloader_v1 are confusing, especially for the default batch_size 4, which happens to be the same as the max_length.
* readability improvements
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> 
						
						
					 
					
						2025-01-04 11:26:10 -06:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							42b703fc0b 
							
						 
					 
					
						
						
							
							Note about SSL certificates ( #404 )  
						
						
						
						
					 
					
						2024-10-19 16:27:19 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							6a9bedc2ec 
							
						 
					 
					
						
						
							
							Update bonus section formatting ( #400 )  
						
						
						
						
					 
					
						2024-10-12 10:26:08 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							3cebcce639 
							
						 
					 
					
						
						
							
							minor spelling fix  
						
						
						
						
					 
					
						2024-09-08 15:35:36 -05:00 
						 
				 
			
				
					
						
							
							
								Gustavo Monti 
							
						 
					 
					
						
						
						
						
							
						
						
							190910e3d6 
							
						 
					 
					
						
						
							
							updating REAMDE from chapter 02 inclund 04_bonus section ( #344 )  
						
						... 
						
						
						
						* updating REAMDE from chapter 02 inclund 04_bonus section
* Update ch02/README.md
---------
Co-authored-by: Gustavo Monti Rocha <gustavo.rocha@intelliway.com.br>
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> 
						
						
					 
					
						2024-09-05 08:09:46 +02:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							f66c089f0b 
							
						 
					 
					
						
						
							
							Test with PyTorch 2.0 and 2.4 ( #290 )  
						
						... 
						
						
						
						* Test with PyTorch 2.0 and 2.4
* Update basic-tests-old-pytorch.yml
* skip version cell 
						
						
					 
					
						2024-07-27 15:09:02 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							6dd8666d9c 
							
						 
					 
					
						
						
							
							Test code in pytorch 2.4 ( #285 )  
						
						... 
						
						
						
						* test code in pytorch 2.4
* update 
						
						
					 
					
						2024-07-24 21:53:41 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							3f6f2af3a3 
							
						 
					 
					
						
						
							
							Simplify embedding vs linear layer code ( #278 )  
						
						
						
						
					 
					
						2024-07-21 12:21:10 -05:00 
						 
				 
			
				
					
						
							
							
								Thanh Tran 
							
						 
					 
					
						
						
						
						
							
						
						
							a2bb045984 
							
						 
					 
					
						
						
							
							fix typos & inconsistent texts ( #269 )  
						
						... 
						
						
						
						Co-authored-by: TRAN <you@example.com> 
						
						
					 
					
						2024-07-17 07:34:51 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							ee1d4730ba 
							
						 
					 
					
						
						
							
							fixes bold font  #267  
						
						
						
						
					 
					
						2024-07-16 17:51:15 -05:00 
						 
				 
			
				
					
						
							
							
								Daniel Kleine 
							
						 
					 
					
						
						
						
						
							
						
						
							7e0dd7f765 
							
						 
					 
					
						
						
							
							minor: removed redundant imports ( #260 )  
						
						... 
						
						
						
						* removed duplicated imports
* removed empty cell 
						
						
					 
					
						2024-07-05 15:33:19 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							bd216fdade 
							
						 
					 
					
						
						
							
							update decode method  
						
						
						
						
					 
					
						2024-07-05 08:34:27 -05:00 
						 
				 
			
				
					
						
							
							
								Suman Debnath 
							
						 
					 
					
						
						
						
						
							
						
						
							46f4d9e575 
							
						 
					 
					
						
						
							
							fixing the regular expression used in the SimpleTokenizer ( #259 )  
						
						... 
						
						
						
						* fixing the regular expression used in the SimpleTokenizer class and a typo in the 2.7 Creating token embedding introduction section
* rerun
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> 
						
						
					 
					
						2024-07-04 12:27:27 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							64536ca40f 
							
						 
					 
					
						
						
							
							update figures  
						
						
						
						
					 
					
						2024-07-02 17:12:42 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							5e24a042c1 
							
						 
					 
					
						
						
							
							add links to summary sections  
						
						
						
						
					 
					
						2024-06-29 07:33:26 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							4fef19e016 
							
						 
					 
					
						
						
							
							remove redundant code lines ( #247 )  
						
						
						
						
					 
					
						2024-06-25 21:44:19 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							f46441d53f 
							
						 
					 
					
						
						
							
							update with latest versions  
						
						
						
						
					 
					
						2024-06-25 21:09:27 -05:00 
						 
				 
			
				
					
						
							
							
								Daniel Kleine 
							
						 
					 
					
						
						
						
						
							
						
						
							7a54d383e7 
							
						 
					 
					
						
						
							
							minor fixes ( #246 )  
						
						... 
						
						
						
						* removed duplicated white spaces
* Update ch07/01_main-chapter-code/ch07.ipynb
* Update ch07/05_dataset-generation/llama3-ollama.ipynb
* removed duplicated white spaces
* fixed title again
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> 
						
						
					 
					
						2024-06-25 17:30:30 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							c1f9361428 
							
						 
					 
					
						
						
							
							add main and optional sections  
						
						
						
						
					 
					
						2024-06-19 17:48:25 -05:00 
						 
				 
			
				
					
						
							
							
								Daniel Kleine 
							
						 
					 
					
						
						
						
						
							
						
						
							73be1c592f 
							
						 
					 
					
						
						
							
							fixed num_workers ( #229 )  
						
						... 
						
						
						
						* fixed num_workers
* ch06 & ch07: added num_workers to create_dataloader_v1 
						
						
					 
					
						2024-06-19 17:36:46 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							b2ff989174 
							
						 
					 
					
						
						
							
							distinguish better between main chapter code and bonus materials  
						
						
						
						
					 
					
						2024-06-11 21:07:42 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							e1adeb14f3 
							
						 
					 
					
						
						
							
							add allowed_special={"<|endoftext|>"}  
						
						
						
						
					 
					
						2024-06-09 06:04:02 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							40ba3a4068 
							
						 
					 
					
						
						
							
							Remove leftover instances of self.tokenizer ( #201 )  
						
						... 
						
						
						
						* Remove leftover instances of self.tokenizer
* add endoftext token 
						
						
					 
					
						2024-06-08 14:57:34 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							20f1ef553c 
							
						 
					 
					
						
						
							
							update figure 2.13  
						
						
						
						
					 
					
						2024-06-01 09:38:33 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							fe8bb9291e 
							
						 
					 
					
						
						
							
							update formatting  
						
						
						
						
					 
					
						2024-05-24 07:20:37 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							1407085f07 
							
						 
					 
					
						
						
							
							reset cell count for better nbdiff  
						
						
						
						
					 
					
						2024-05-22 20:27:09 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							85c3210105 
							
						 
					 
					
						
						
							
							update regex  
						
						
						
						
					 
					
						2024-05-22 20:15:31 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							678fad50bc 
							
						 
					 
					
						
						
							
							formatting for consistency with production chapter  
						
						
						
						
					 
					
						2024-05-18 11:03:42 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							6c6321f671 
							
						 
					 
					
						
						
							
							simplify code  
						
						
						
						
					 
					
						2024-05-16 20:16:25 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							0f03c20483 
							
						 
					 
					
						
						
							
							Data loader intuition with numbers ( #132 )  
						
						... 
						
						
						
						* data loader intuition with numbers
* fix link
* fix tests 
						
						
					 
					
						2024-04-27 07:56:41 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							379a8ab39c 
							
						 
					 
					
						
						
							
							update figures in bonus notebook  
						
						
						
						
					 
					
						2024-04-23 21:01:27 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							44a009f7e6 
							
						 
					 
					
						
						
							
							update stride wording  
						
						
						
						
					 
					
						2024-04-22 20:40:48 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							bae4b0fb08 
							
						 
					 
					
						
						
							
							Make datesets and loaders compatible with multiprocessing ( #118 )  
						
						
						
						
					 
					
						2024-04-13 13:57:56 -05:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							bbce1cb143 
							
						 
					 
					
						
						
							
							Automated link checking ( #117 )  
						
						... 
						
						
						
						* Automated link checking
* Fix links in Jupyter Nbs 
						
						
					 
					
						2024-04-12 19:08:34 -04:00 
						 
				 
			
				
					
						
							
							
								James Holcombe 
							
						 
					 
					
						
						
						
						
							
						
						
							0b866c133f 
							
						 
					 
					
						
						
							
							Use instance tokenizer ( #116 )  
						
						... 
						
						
						
						* Use instance tokenizer
* consistency updates
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> 
						
						
					 
					
						2024-04-10 21:16:19 -04:00 
						 
				 
			
				
					
						
							
							
								Sebastian Raschka 
							
						 
					 
					
						
						
						
						
							
						
						
							ccd7cebbb3 
							
						 
					 
					
						
						
							
							Rename variable to context_length to make it easier on readers ( #106 )  
						
						... 
						
						
						
						* rename to context length
* fix spacing 
						
						
					 
					
						2024-04-04 07:27:41 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							edcae09884 
							
						 
					 
					
						
						
							
							improve importlib experience for windows users  
						
						
						
						
					 
					
						2024-04-03 06:31:15 -05:00 
						 
				 
			
				
					
						
							
							
								Intelligence-Manifesto 
							
						 
					 
					
						
						
						
						
							
						
						
							d081928e90 
							
						 
					 
					
						
						
							
							code -> markdown ( #101 )  
						
						
						
						
					 
					
						2024-04-02 14:37:45 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							1c173e4f44 
							
						 
					 
					
						
						
							
							update figures  
						
						
						
						
					 
					
						2024-03-30 09:43:51 -05:00 
						 
				 
			
				
					
						
							
							
								rasbt 
							
						 
					 
					
						
						
						
						
							
						
						
							ca96b7aee5 
							
						 
					 
					
						
						
							
							minor updates  
						
						
						
						
					 
					
						2024-03-29 20:42:32 -05:00 
						 
				 
			
				
					
						
							
							
								Jeff Hammerbacher 
							
						 
					 
					
						
						
						
						
							
						
						
							5b222e2d6f 
							
						 
					 
					
						
						
							
							Fix small typos in ch02.ipynb ( #89 )  
						
						
						
						
					 
					
						2024-03-29 08:25:52 -05:00