Add alternative attention structure (#880)

This commit is contained in:
Sebastian Raschka 2025-10-13 14:31:13 -05:00 committed by GitHub
parent 6eb6adfa33
commit bf039ff3dc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 17 additions and 3 deletions

View File

@ -168,6 +168,7 @@ Several folders contain optional materials as a bonus for interested readers:
- **Chapter 4: Implementing a GPT model from scratch**
- [FLOPS Analysis](ch04/02_performance-analysis/flops-analysis.ipynb)
- [KV Cache](ch04/03_kv-cache)
- [Attention alternatives](ch04/#attention-alternatives)
- [Grouped-Query Attention](ch04/04_gqa)
- [Multi-Head Latent Attention](ch04/05_mla)
- [Sliding Window Attention](ch04/06_swa)

View File

@ -11,11 +11,24 @@
- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
- [03_kv-cache](03_kv-cache) implements a KV cache to speed up the text generation during inference
- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)
 
## Attention Alternatives
 
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/attention-alternatives/attention-alternatives.webp">
&nbsp;
- [04_gqa](04_gqa) contains an introduction to Grouped-Query Attention (GQA), which is used by most modern LLMs (Llama 4, gpt-oss, Qwen3, Gemma 3, and many more) as alternative to regular Multi-Head Attention (MHA)
- [05_mla](05_mla) contains an introduction to Multi-Head Latent Attention (MLA), which is used by DeepSeek V3, as alternative to regular Multi-Head Attention (MHA)
- [06_swa](06_swa) contains an introduction to Sliding Window Attention (SWA), which is used by Gemma 3 and others
&nbsp;
## More
In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.