diff --git a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb index 65113dc..489dad7 100644 --- a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb +++ b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb @@ -143,7 +143,7 @@ "- LayerNorm normalizes inputs using mean and variance, while RMSNorm uses only the root mean square, which improves computational efficiency\n", "- The RMSNorm operation is as follows, where $x$ is the input $\\gamma$ is a trainable parameter (vector), and $\\epsilon$ is a small constant to avoid zero-division errors:\n", "\n", - "$$y = \\frac{x}{\\sqrt{\\text{RMS}[x]} + \\epsilon} * \\gamma$$\n", + "$$y = \\frac{x}{\\sqrt{\\text{RMS}[x^2]} + \\epsilon} * \\gamma$$\n", "\n", "- For more details, please see the paper [Root Mean Square Layer Normalization (2019)](https://arxiv.org/abs/1910.07467)" ] @@ -313,7 +313,7 @@ "- In fact, Llama uses a \"Gates Linear Unit\" (GLU) variant of SiLU called SwiGLU, which essentially results in a slightly differently structured `FeedForward` module\n", "- SwiGLU uses a gating mechanism in the feedforward layer, with the formula:\n", "\n", - "$$\\text{SwiGLU}(x) = (\\text{Linear}_1(x) * \\text{SiLU}(\\text{Linear}_2(x)))$$\n", + "$$\\text{SwiGLU}(x) = \\text{SiLU}(\\text{Linear}_1(x)) * (\\text{Linear}_2(x))$$\n", "\n", "- Here, $\\text{Linear}_1$ and $\\text{Linear}_2$ are two linear layers, and $*$ denotes element-wise multiplication\n", "- The third linear layer, $\\text{Linear}_3$, is applied after this gated activation\n", @@ -519,7 +519,7 @@ "- Here, we modify the `MultiHeadAttention` class with the appropriate RoPE code\n", "- In addition, we remove the `qkv_bias` option and hardcode the `bias=False` setting\n", "- Also, we add a dtype setting to be able to instantiate the model with a lower precision later\n", - " - Tip: since the `TransformerBlock's (in the next section) are repeated exactly, we could simplify the code and only initialize the buffers once instead for each `MultiHeadAttention` module; however, we add the precomputed RoPE parameters to the `MultiHeadAttention` class so that it can function as a standalone module" + " - Tip: since the `TransformerBlock`s (in the next section) are repeated exactly, we could simplify the code and only initialize the buffers once instead for each `MultiHeadAttention` module; however, we add the precomputed RoPE parameters to the `MultiHeadAttention` class so that it can function as a standalone module" ] }, { @@ -1068,7 +1068,7 @@ }, "source": [ "- Please note that Meta AI requires that you accept the Llama 2 licensing terms before you can download the files; to do this, you have to create a Hugging Face Hub account and visit the [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b) repository to accept the terms\n", - "- Next, you will need to create an access token; to generate an access token, click on the profile picture in the upper right and click on \"Settings\"\n", + "- Next, you will need to create an access token; to generate an access token with READ permissions, click on the profile picture in the upper right and click on \"Settings\"\n", "\n", "\n", "\n", @@ -1237,7 +1237,7 @@ " max_new_tokens=30,\n", " context_size=LLAMA2_CONFIG_7B[\"context_length\"],\n", " top_k=1,\n", - " temperature=1.0\n", + " temperature=0.\n", ")\n", "\n", "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))" @@ -1565,7 +1565,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.10.11" } }, "nbformat": 4,