Clarify Qwen3 notebook purpose (#812)

* Clarify Qwen3 notebook purpose

* Update README.md

* Update README.md
This commit is contained in:
Sebastian Raschka 2025-09-06 15:31:35 -05:00 committed by GitHub
parent 6d175a22df
commit 4b0021416a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -9,6 +9,22 @@ This [standalone-qwen3-moe.ipynb](standalone-qwen3-moe.ipynb) and [standalone-qw
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/qwen/qwen3-coder-flash-overview.webp?123" width="430px">
&nbsp;
# Qwen3 from-scratch code
The standalone notebooks in this folder contain from-scratch codes in linear fashion:
1. [standalone-qwen3.ipynb](standalone-qwen3.ipynb): The dense Qwen3 model without bells and whistles
2. [standalone-qwen3-plus-kvcache.ipynb](standalone-qwen3-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency
3. [standalone-qwen3-moe.ipynb](standalone-qwen3-moe.ipynb): Like the first notebook but the Mixture-of-Experts (MoE) variant
4. [standalone-qwen3-moe-plus-kvcache.ipynb](standalone-qwen3-moe-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency
Alternatively, I also organized the code into a Python package [here](../../pkg/llms_from_scratch/) (including unit tests and CI), which you can run as described below.
&nbsp;
# Training
The `Qwen3Model` class is implemented in a similar style as the `GPTModel` class, so it can be used as a drop-in replacement for training in chapter 5 and finetuning in chapters 6 and 7.
&nbsp;