mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-08-28 18:40:01 +00:00
improve formatting
This commit is contained in:
parent
4541177063
commit
a23fca84d5
@ -39,10 +39,10 @@
|
||||
"id": "aFmxTQbwCUMl"
|
||||
},
|
||||
"source": [
|
||||
"- In this notebook, we convert the original GPT and GPT-2 architecture into a Llama 2 model step by step\n",
|
||||
"- In this notebook, we convert the original GPT architecture into a Llama 2 model step by step (note the GPT and GPT-2 share the same architecture)\n",
|
||||
"- Why not Llama 1 or Llama 3?\n",
|
||||
" - The Llama 1 architecture is similar to Llama 2, except that Llama 2 has a larger context window (which is nice); the Llama 1 weights are not readily available and have more usage restrictions, so it makes more sense to focus on Llama 2\n",
|
||||
" - Regarding Llama 3, I will share a separate notebook to convert Llama 2 to Llama 3 (there are only a few small additional changes)\n",
|
||||
" - The Llama 1 architecture is similar to Llama 2, except that Llama 2 has a larger context window (which is nice); the Llama 1 weights are not readily available and have more usage restrictions, so it makes more sense to focus on Llama 2\n",
|
||||
" - Regarding Llama 3, I will share a separate notebook to convert Llama 2 to Llama 3 (there are only a few small additional changes)\n",
|
||||
"- The explanations are purposefully kept minimal in this notebook not to bloat it unnecessarily and focus on the main code\n",
|
||||
"- For more information, please see the Llama 2 paper: [Llama 2: Open Foundation and Fine-Tuned Chat Models (2023)](https://arxiv.org/abs/2307.09288)"
|
||||
]
|
||||
@ -143,7 +143,7 @@
|
||||
"- LayerNorm normalizes inputs using mean and variance, while RMSNorm uses only the root mean square, which improves computational efficiency\n",
|
||||
"- The RMSNorm operation is as follows, where $x$ is the input $\\gamma$ is a trainable parameter (vector), and $\\epsilon$ is a small constant to avoid zero-division errors:\n",
|
||||
"\n",
|
||||
"$$y = \\frac{x}{\\sqrt{\\text{RMS}[x^2]} + \\epsilon} * \\gamma$$\n",
|
||||
"$$y_i = \\frac{x_i}{\\text{RMS}(x)} \\gamma_i, \\quad \\text{where} \\quad \\text{RMS}(x) = \\sqrt{\\epsilon + \\frac{1}{n} \\sum x_i^2}$$\n",
|
||||
"\n",
|
||||
"- For more details, please see the paper [Root Mean Square Layer Normalization (2019)](https://arxiv.org/abs/1910.07467)"
|
||||
]
|
||||
@ -1565,7 +1565,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
x
Reference in New Issue
Block a user