diff --git a/ch05/01_main-chapter-code/ch05.ipynb b/ch05/01_main-chapter-code/ch05.ipynb index 0b4fb25..cd90777 100644 --- a/ch05/01_main-chapter-code/ch05.ipynb +++ b/ch05/01_main-chapter-code/ch05.ipynb @@ -161,7 +161,7 @@ "source": [ "- We use dropout of 0.1 above, but it's relatively common to train LLMs without dropout nowadays\n", "- Modern LLMs also don't use bias vectors in the `nn.Linear` layers for the query, key, and value matrices (unlike earlier GPT models), which is achieved by setting `\"qkv_bias\": False`\n", - "- We reduce the context length (`context_length`) of only 256 tokens to reduce the computational resource requirements for training the model, whereas the original 124 million parameter GPT-2 model used 1024 characters\n", + "- We reduce the context length (`context_length`) of only 256 tokens to reduce the computational resource requirements for training the model, whereas the original 124 million parameter GPT-2 model used 1024 tokens\n", " - This is so that more readers will be able to follow and execute the code examples on their laptop computer\n", " - However, please feel free to increase the `context_length` to 1024 tokens (this would not require any code changes)\n", " - We will also load a model with a 1024 `context_length` later from pretrained weights"