mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-12-11 06:51:21 +00:00
Clarify Qwen3 notebook purpose (#812)
* Clarify Qwen3 notebook purpose * Update README.md * Update README.md
This commit is contained in:
parent
6d175a22df
commit
4b0021416a
@ -9,6 +9,22 @@ This [standalone-qwen3-moe.ipynb](standalone-qwen3-moe.ipynb) and [standalone-qw
|
||||
|
||||
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/qwen/qwen3-coder-flash-overview.webp?123" width="430px">
|
||||
|
||||
|
||||
# Qwen3 from-scratch code
|
||||
|
||||
The standalone notebooks in this folder contain from-scratch codes in linear fashion:
|
||||
|
||||
1. [standalone-qwen3.ipynb](standalone-qwen3.ipynb): The dense Qwen3 model without bells and whistles
|
||||
2. [standalone-qwen3-plus-kvcache.ipynb](standalone-qwen3-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency
|
||||
3. [standalone-qwen3-moe.ipynb](standalone-qwen3-moe.ipynb): Like the first notebook but the Mixture-of-Experts (MoE) variant
|
||||
4. [standalone-qwen3-moe-plus-kvcache.ipynb](standalone-qwen3-moe-plus-kvcache.ipynb): Same as above but with KV cache for better inference efficiency
|
||||
|
||||
Alternatively, I also organized the code into a Python package [here](../../pkg/llms_from_scratch/) (including unit tests and CI), which you can run as described below.
|
||||
|
||||
|
||||
# Training
|
||||
|
||||
The `Qwen3Model` class is implemented in a similar style as the `GPTModel` class, so it can be used as a drop-in replacement for training in chapter 5 and finetuning in chapters 6 and 7.
|
||||
|
||||
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user