mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-10-25 23:11:23 +00:00 
			
		
		
		
	use correct chapter reference
This commit is contained in:
		
							parent
							
								
									64536ca40f
								
							
						
					
					
						commit
						52e10c7360
					
				| @ -148,7 +148,7 @@ | |||||||
|     "- `\"n_heads\"` is the number of attention heads in the multi-head attention mechanism implemented in Chapter 3\n", |     "- `\"n_heads\"` is the number of attention heads in the multi-head attention mechanism implemented in Chapter 3\n", | ||||||
|     "- `\"n_layers\"` is the number of transformer blocks within the model, which we'll implement in upcoming sections\n", |     "- `\"n_layers\"` is the number of transformer blocks within the model, which we'll implement in upcoming sections\n", | ||||||
|     "- `\"drop_rate\"` is the dropout mechanism's intensity, discussed in Chapter 3; 0.1 means dropping 10% of hidden units during training to mitigate overfitting\n", |     "- `\"drop_rate\"` is the dropout mechanism's intensity, discussed in Chapter 3; 0.1 means dropping 10% of hidden units during training to mitigate overfitting\n", | ||||||
|     "- `\"qkv_bias\"` decides if the `Linear` layers in the multi-head attention mechanism (from Chapter 3) should include a bias vector when computing query (Q), key (K), and value (V) tensors; we'll disable this option, which is standard practice in modern LLMs; however, we'll revisit this later when loading pretrained GPT-2 weights from OpenAI into our reimplementation in Chapter 6" |     "- `\"qkv_bias\"` decides if the `Linear` layers in the multi-head attention mechanism (from Chapter 3) should include a bias vector when computing query (Q), key (K), and value (V) tensors; we'll disable this option, which is standard practice in modern LLMs; however, we'll revisit this later when loading pretrained GPT-2 weights from OpenAI into our reimplementation in chapter 5" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -1238,7 +1238,7 @@ | |||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "- In practice, I found it easier to train the model without weight-tying, which is why we didn't implement it here\n", |     "- In practice, I found it easier to train the model without weight-tying, which is why we didn't implement it here\n", | ||||||
|     "- However, we will revisit and apply this weight-tying idea later when we load the pretrained weights in Chapter 6\n", |     "- However, we will revisit and apply this weight-tying idea later when we load the pretrained weights in chapter 5\n", | ||||||
|     "- Lastly, we can compute the memory requirements of the model as follows, which can be a helpful reference point:" |     "- Lastly, we can compute the memory requirements of the model as follows, which can be a helpful reference point:" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 rasbt
						rasbt