mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-11-29 00:21:54 +00:00
Fix ffn link (#892)
* Fix ffn link * Apply suggestion from @rasbt * Apply suggestion from @rasbt
This commit is contained in:
parent
7ca7c47e4a
commit
0adb5b8c65
@ -98,7 +98,7 @@ uv run plot_memory_estimates_moe.py \
|
||||
|
||||
## MoE Code Examples
|
||||
|
||||
The [gpt_with_kv_moe.py](gpt_with_kv_moe.py) and [gpt_with_kv_moe.py](gpt_with_kv_moe.py) scripts in this folder provide hands-on examples for comparing the regular FFN and MoE memory usage in the context of a GPT model implementation. Note that both scripts use [SwiGLU](https://arxiv.org/abs/2002.05202) feed-forward modules as shown in the first figure of this page (GPT-2 traditionally uses GELU).
|
||||
The [gpt_with_kv_ffn.py](gpt_with_kv_ffn.py) and [gpt_with_kv_moe.py](gpt_with_kv_moe.py) scripts in this folder provide hands-on examples for comparing the regular FFN and MoE memory usage in the context of a GPT model implementation. Note that both scripts use [SwiGLU](https://arxiv.org/abs/2002.05202) feed-forward modules as shown in the first figure of this page (GPT-2 traditionally uses GELU).
|
||||
|
||||
**Note: The model is not trained and thus generates nonsensical text. You can find a trained MoE in the bonus materials at [../../ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb](../../ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb).**
|
||||
|
||||
|
||||
@ -1 +1 @@
|
||||
Subproject commit 3961a7101465ac12cc476bb24ffcb0c27c073982
|
||||
Subproject commit f49e9e2aadf4ca688201104864da97ee5ceb2abe
|
||||
Loading…
x
Reference in New Issue
Block a user