diff --git a/ch05/11_qwen3/README.md b/ch05/11_qwen3/README.md index 33149cc..97dfe28 100644 --- a/ch05/11_qwen3/README.md +++ b/ch05/11_qwen3/README.md @@ -9,6 +9,22 @@ This [standalone-qwen3-moe.ipynb](standalone-qwen3-moe.ipynb) and [standalone-qw +  +# Qwen3 from-scratch code + +The standalone notebooks in this folder contain from-scratch codes in linear fashion: + +1. [standalone-qwen3.ipynb](standalone-qwen3.ipynb): The dense Qwen3 model without bells and whistles +2. [standalone-qwen3-plus-kvcache.ipynb](standalone-qwen3-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency +3. [standalone-qwen3-moe.ipynb](standalone-qwen3-moe.ipynb): Like the first notebook but the Mixture-of-Experts (MoE) variant +4. [standalone-qwen3-moe-plus-kvcache.ipynb](standalone-qwen3-moe-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency + +Alternatively, I also organized the code into a Python package [here](../../pkg/llms_from_scratch/) (including unit tests and CI), which you can run as described below. + +  +# Training + +The `Qwen3Model` class is implemented in a similar style as the `GPTModel` class, so it can be used as a drop-in replacement for training in chapter 5 and finetuning in chapters 6 and 7.