Gemma 3 270M From Scratch

This standalone-gemma3.ipynb Jupyter notebook in this folder contains a from-scratch implementation of Gemma 3 270M. It requires about 2 GB of RAM to run.

The alternative standalone-gemma3-plus-kvcache.ipynb notebook adds a KV cache for better runtime performance (but adds more code complexity). To learn more about KV caching, see my Understanding and Coding the KV Cache in LLMs from Scratch article.

Model Mode Hardware Tokens/sec GPU Memory (VRAM)
Gemma3Model 270M Regular Mac Mini M4 CPU 8 -
Gemma3Model 270M Regular compiled Mac Mini M4 CPU 9 -
Gemma3Model 270M KV cache Mac Mini M4 CPU 130 -
Gemma3Model 270M KV cache compiled Mac Mini M4 CPU 224 -
Gemma3Model 270M Regular Mac Mini M4 GPU 16 -
Gemma3Model 270M Regular compiled Mac Mini M4 GPU Error -
Gemma3Model 270M KV cache Mac Mini M4 GPU 23 -
Gemma3Model 270M KV cache compiled Mac Mini M4 GPU Error -
Gemma3Model 270M Regular Nvidia A100 GPU 28 1.84 GB
Gemma3Model 270M Regular compiled Nvidia A100 GPU 128 2.12 GB
Gemma3Model 270M KV cache Nvidia A100 GPU 26 1.77 GB
Gemma3Model 270M KV cache compiled Nvidia A100 GPU 99 2.12 GB

Below is a side-by-side comparison with Qwen3 0.6B as a reference model; if you are interested in the Qwen3 0.6B standalone notebook, you can find it here.



To learn more about the architecture differences and read about comparisons with other architectures, see my The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design article.