mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-26 23:39:53 +00:00
fix 1024 characters to 1024 tokens (#152)
This commit is contained in:
parent
c94f24e759
commit
7b34833ee1
@ -161,7 +161,7 @@
|
||||
"source": [
|
||||
"- We use dropout of 0.1 above, but it's relatively common to train LLMs without dropout nowadays\n",
|
||||
"- Modern LLMs also don't use bias vectors in the `nn.Linear` layers for the query, key, and value matrices (unlike earlier GPT models), which is achieved by setting `\"qkv_bias\": False`\n",
|
||||
"- We reduce the context length (`context_length`) of only 256 tokens to reduce the computational resource requirements for training the model, whereas the original 124 million parameter GPT-2 model used 1024 characters\n",
|
||||
"- We reduce the context length (`context_length`) of only 256 tokens to reduce the computational resource requirements for training the model, whereas the original 124 million parameter GPT-2 model used 1024 tokens\n",
|
||||
" - This is so that more readers will be able to follow and execute the code examples on their laptop computer\n",
|
||||
" - However, please feel free to increase the `context_length` to 1024 tokens (this would not require any code changes)\n",
|
||||
" - We will also load a model with a 1024 `context_length` later from pretrained weights"
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user