yujunjun/LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-11-20 20:19:01 +00:00

History

Sebastian Raschka 7ca7c47e4a

Make quote style consistent (#891 )

2025-10-21 19:42:33 -05:00

..

01_main-chapter-code

Make quote style consistent (#891 )

2025-10-21 19:42:33 -05:00

02_performance-analysis

Add PyPI package (#576 )

2025-03-23 19:28:49 -05:00

Link the other KV cache sections (#708 )

2025-06-24 16:52:29 -05:00

Multi-Head Latent Attention (#876 )

2025-10-11 20:08:30 -05:00

Update the compression rate comment in MLA (#883 )

2025-10-14 11:10:06 -05:00

Mixture-of-Experts intro (#888 )

2025-10-19 22:17:59 -05:00

- docs(moe): correct arXiv link for DeepSeekMoE (#890 )

2025-10-20 19:29:06 -05:00

README.md

Mixture-of-Experts intro (#888 )

2025-10-19 22:17:59 -05:00

README.md

Chapter 4: Implementing a GPT Model from Scratch to Generate Text

Main Chapter Code

01_main-chapter-code contains the main chapter code.

Bonus Materials

02_performance-analysis contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
03_kv-cache implements a KV cache to speed up the text generation during inference
07_moe explanation and implementation of Mixture-of-Experts (MoE)
ch05/07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)

Attention Alternatives

04_gqa contains an introduction to Grouped-Query Attention (GQA), which is used by most modern LLMs (Llama 4, gpt-oss, Qwen3, Gemma 3, and many more) as alternative to regular Multi-Head Attention (MHA)
05_mla contains an introduction to Multi-Head Latent Attention (MLA), which is used by DeepSeek V3, as alternative to regular Multi-Head Attention (MHA)
06_swa contains an introduction to Sliding Window Attention (SWA), which is used by Gemma 3 and others

More

In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.