mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-09-01 20:38:11 +00:00
fix swiglu acronym
This commit is contained in:
parent
aec169dc12
commit
c735c21e87
@ -581,7 +581,7 @@
|
|||||||
"- In this section, we implement a small neural network submodule that is used as part of the transformer block in LLMs\n",
|
"- In this section, we implement a small neural network submodule that is used as part of the transformer block in LLMs\n",
|
||||||
"- We start with the activation function\n",
|
"- We start with the activation function\n",
|
||||||
"- In deep learning, ReLU (Rectified Linear Unit) activation functions are commonly used due to their simplicity and effectiveness in various neural network architectures\n",
|
"- In deep learning, ReLU (Rectified Linear Unit) activation functions are commonly used due to their simplicity and effectiveness in various neural network architectures\n",
|
||||||
"- In LLMs, various other types of activation functions are used beyond the traditional ReLU; two notable examples are GELU (Gaussian Error Linear Unit) and SwiGLU (Sigmoid-Weighted Linear Unit)\n",
|
"- In LLMs, various other types of activation functions are used beyond the traditional ReLU; two notable examples are GELU (Gaussian Error Linear Unit) and SwiGLU (Swish-Gated Linear Unit)\n",
|
||||||
"- GELU and SwiGLU are more complex, smooth activation functions incorporating Gaussian and sigmoid-gated linear units, respectively, offering better performance for deep learning models, unlike the simpler, piecewise linear function of ReLU"
|
"- GELU and SwiGLU are more complex, smooth activation functions incorporating Gaussian and sigmoid-gated linear units, respectively, offering better performance for deep learning models, unlike the simpler, piecewise linear function of ReLU"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
Loading…
x
Reference in New Issue
Block a user