From 0adb5b8c6573e337ce01c69d6553ba031c23e405 Mon Sep 17 00:00:00 2001 From: Sebastian Raschka Date: Tue, 21 Oct 2025 21:19:44 -0500 Subject: [PATCH] Fix ffn link (#892) * Fix ffn link * Apply suggestion from @rasbt * Apply suggestion from @rasbt --- ch04/07_moe/README.md | 2 +- reasoning-from-scratch | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/ch04/07_moe/README.md b/ch04/07_moe/README.md index 01d1cdc..b0d401f 100644 --- a/ch04/07_moe/README.md +++ b/ch04/07_moe/README.md @@ -98,7 +98,7 @@ uv run plot_memory_estimates_moe.py \   ## MoE Code Examples -The [gpt_with_kv_moe.py](gpt_with_kv_moe.py) and [gpt_with_kv_moe.py](gpt_with_kv_moe.py) scripts in this folder provide hands-on examples for comparing the regular FFN and MoE memory usage in the context of a GPT model implementation. Note that both scripts use [SwiGLU](https://arxiv.org/abs/2002.05202) feed-forward modules as shown in the first figure of this page (GPT-2 traditionally uses GELU). +The [gpt_with_kv_ffn.py](gpt_with_kv_ffn.py) and [gpt_with_kv_moe.py](gpt_with_kv_moe.py) scripts in this folder provide hands-on examples for comparing the regular FFN and MoE memory usage in the context of a GPT model implementation. Note that both scripts use [SwiGLU](https://arxiv.org/abs/2002.05202) feed-forward modules as shown in the first figure of this page (GPT-2 traditionally uses GELU). **Note: The model is not trained and thus generates nonsensical text. You can find a trained MoE in the bonus materials at [../../ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb](../../ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb).** diff --git a/reasoning-from-scratch b/reasoning-from-scratch index 3961a71..f49e9e2 160000 --- a/reasoning-from-scratch +++ b/reasoning-from-scratch @@ -1 +1 @@ -Subproject commit 3961a7101465ac12cc476bb24ffcb0c27c073982 +Subproject commit f49e9e2aadf4ca688201104864da97ee5ceb2abe