This standalone-gemma3.ipynb Jupyter notebook in this folder contains a from-scratch implementation of Gemma 3 270M. It requires about 2 GB of RAM to run.

The alternative standalone-gemma3-plus-kvcache.ipynb notebook adds a KV cache for better runtime performance (but adds more code complexity). To learn more about KV caching, see my Understanding and Coding the KV Cache in LLMs from Scratch article.

Model	Mode	Hardware	Tokens/sec	GPU Memory (VRAM)
Gemma3Model 270M	Regular	Mac Mini M4 CPU	8	-
Gemma3Model 270M	Regular compiled	Mac Mini M4 CPU	9	-
Gemma3Model 270M	KV cache	Mac Mini M4 CPU	130	-
Gemma3Model 270M	KV cache compiled	Mac Mini M4 CPU	224	-

Gemma3Model 270M	Regular	Mac Mini M4 GPU	16	-
Gemma3Model 270M	Regular compiled	Mac Mini M4 GPU	Error	-
Gemma3Model 270M	KV cache	Mac Mini M4 GPU	23	-
Gemma3Model 270M	KV cache compiled	Mac Mini M4 GPU	Error	-

Gemma3Model 270M	Regular	Nvidia A100 GPU	28	1.84 GB
Gemma3Model 270M	Regular compiled	Nvidia A100 GPU	128	2.12 GB
Gemma3Model 270M	KV cache	Nvidia A100 GPU	26	1.77 GB
Gemma3Model 270M	KV cache compiled	Nvidia A100 GPU	99	2.12 GB

Below is a side-by-side comparison with Qwen3 0.6B as a reference model; if you are interested in the Qwen3 0.6B standalone notebook, you can find it here.

To learn more about the architecture differences and read about comparisons with other architectures, see my The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design article.