LLMs-from-scratch/ch03/02_bonus_efficient-multihead-attention
casinca 00c240ff87
some typo fixes (#858)
* fix(typo): correct scaling

* fix(typo): correct comment for `instruct`
2025-09-30 11:18:02 -05:00
..
2024-09-05 18:24:33 +02:00

More Efficient Multi-Head Attention Implementations

Summary

The figures below summarize the performance benchmarks (lower is better).

 

Forward pass only

 

Forward and backward pass

 

Forward and backward pass after compilation