mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-30 09:20:46 +00:00
ch05/07 gpt_to_llama text improvements (#369)
* fixed typo * fixed RMSnorm formula * fixed SwiGLU formula * temperature=0 for untrained model for reproducibility * added extra info hf token
This commit is contained in:
parent
d144bd5b7a
commit
ff31b345b0
@ -143,7 +143,7 @@
|
|||||||
"- LayerNorm normalizes inputs using mean and variance, while RMSNorm uses only the root mean square, which improves computational efficiency\n",
|
"- LayerNorm normalizes inputs using mean and variance, while RMSNorm uses only the root mean square, which improves computational efficiency\n",
|
||||||
"- The RMSNorm operation is as follows, where $x$ is the input $\\gamma$ is a trainable parameter (vector), and $\\epsilon$ is a small constant to avoid zero-division errors:\n",
|
"- The RMSNorm operation is as follows, where $x$ is the input $\\gamma$ is a trainable parameter (vector), and $\\epsilon$ is a small constant to avoid zero-division errors:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"$$y = \\frac{x}{\\sqrt{\\text{RMS}[x]} + \\epsilon} * \\gamma$$\n",
|
"$$y = \\frac{x}{\\sqrt{\\text{RMS}[x^2]} + \\epsilon} * \\gamma$$\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- For more details, please see the paper [Root Mean Square Layer Normalization (2019)](https://arxiv.org/abs/1910.07467)"
|
"- For more details, please see the paper [Root Mean Square Layer Normalization (2019)](https://arxiv.org/abs/1910.07467)"
|
||||||
]
|
]
|
||||||
@ -313,7 +313,7 @@
|
|||||||
"- In fact, Llama uses a \"Gates Linear Unit\" (GLU) variant of SiLU called SwiGLU, which essentially results in a slightly differently structured `FeedForward` module\n",
|
"- In fact, Llama uses a \"Gates Linear Unit\" (GLU) variant of SiLU called SwiGLU, which essentially results in a slightly differently structured `FeedForward` module\n",
|
||||||
"- SwiGLU uses a gating mechanism in the feedforward layer, with the formula:\n",
|
"- SwiGLU uses a gating mechanism in the feedforward layer, with the formula:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"$$\\text{SwiGLU}(x) = (\\text{Linear}_1(x) * \\text{SiLU}(\\text{Linear}_2(x)))$$\n",
|
"$$\\text{SwiGLU}(x) = \\text{SiLU}(\\text{Linear}_1(x)) * (\\text{Linear}_2(x))$$\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- Here, $\\text{Linear}_1$ and $\\text{Linear}_2$ are two linear layers, and $*$ denotes element-wise multiplication\n",
|
"- Here, $\\text{Linear}_1$ and $\\text{Linear}_2$ are two linear layers, and $*$ denotes element-wise multiplication\n",
|
||||||
"- The third linear layer, $\\text{Linear}_3$, is applied after this gated activation\n",
|
"- The third linear layer, $\\text{Linear}_3$, is applied after this gated activation\n",
|
||||||
@ -519,7 +519,7 @@
|
|||||||
"- Here, we modify the `MultiHeadAttention` class with the appropriate RoPE code\n",
|
"- Here, we modify the `MultiHeadAttention` class with the appropriate RoPE code\n",
|
||||||
"- In addition, we remove the `qkv_bias` option and hardcode the `bias=False` setting\n",
|
"- In addition, we remove the `qkv_bias` option and hardcode the `bias=False` setting\n",
|
||||||
"- Also, we add a dtype setting to be able to instantiate the model with a lower precision later\n",
|
"- Also, we add a dtype setting to be able to instantiate the model with a lower precision later\n",
|
||||||
" - Tip: since the `TransformerBlock's (in the next section) are repeated exactly, we could simplify the code and only initialize the buffers once instead for each `MultiHeadAttention` module; however, we add the precomputed RoPE parameters to the `MultiHeadAttention` class so that it can function as a standalone module"
|
" - Tip: since the `TransformerBlock`s (in the next section) are repeated exactly, we could simplify the code and only initialize the buffers once instead for each `MultiHeadAttention` module; however, we add the precomputed RoPE parameters to the `MultiHeadAttention` class so that it can function as a standalone module"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -1068,7 +1068,7 @@
|
|||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"- Please note that Meta AI requires that you accept the Llama 2 licensing terms before you can download the files; to do this, you have to create a Hugging Face Hub account and visit the [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b) repository to accept the terms\n",
|
"- Please note that Meta AI requires that you accept the Llama 2 licensing terms before you can download the files; to do this, you have to create a Hugging Face Hub account and visit the [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b) repository to accept the terms\n",
|
||||||
"- Next, you will need to create an access token; to generate an access token, click on the profile picture in the upper right and click on \"Settings\"\n",
|
"- Next, you will need to create an access token; to generate an access token with READ permissions, click on the profile picture in the upper right and click on \"Settings\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/settings.webp?1\" width=\"300px\">\n",
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/settings.webp?1\" width=\"300px\">\n",
|
||||||
@ -1237,7 +1237,7 @@
|
|||||||
" max_new_tokens=30,\n",
|
" max_new_tokens=30,\n",
|
||||||
" context_size=LLAMA2_CONFIG_7B[\"context_length\"],\n",
|
" context_size=LLAMA2_CONFIG_7B[\"context_length\"],\n",
|
||||||
" top_k=1,\n",
|
" top_k=1,\n",
|
||||||
" temperature=1.0\n",
|
" temperature=0.\n",
|
||||||
")\n",
|
")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
|
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
|
||||||
@ -1565,7 +1565,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.10.6"
|
"version": "3.10.11"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user