From c735c21e8745b62d1f5ed4a3a6a1c2ebb59d9615 Mon Sep 17 00:00:00 2001 From: rasbt Date: Wed, 1 May 2024 20:26:17 -0500 Subject: [PATCH] fix swiglu acronym --- ch04/01_main-chapter-code/ch04.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch04/01_main-chapter-code/ch04.ipynb b/ch04/01_main-chapter-code/ch04.ipynb index ac1d5e0..e1a67f6 100644 --- a/ch04/01_main-chapter-code/ch04.ipynb +++ b/ch04/01_main-chapter-code/ch04.ipynb @@ -581,7 +581,7 @@ "- In this section, we implement a small neural network submodule that is used as part of the transformer block in LLMs\n", "- We start with the activation function\n", "- In deep learning, ReLU (Rectified Linear Unit) activation functions are commonly used due to their simplicity and effectiveness in various neural network architectures\n", - "- In LLMs, various other types of activation functions are used beyond the traditional ReLU; two notable examples are GELU (Gaussian Error Linear Unit) and SwiGLU (Sigmoid-Weighted Linear Unit)\n", + "- In LLMs, various other types of activation functions are used beyond the traditional ReLU; two notable examples are GELU (Gaussian Error Linear Unit) and SwiGLU (Swish-Gated Linear Unit)\n", "- GELU and SwiGLU are more complex, smooth activation functions incorporating Gaussian and sigmoid-gated linear units, respectively, offering better performance for deep learning models, unlike the simpler, piecewise linear function of ReLU" ] },