2024-03-06 08:30:32 -06:00
|
|
|
# More Efficient Multi-Head Attention Implementations
|
|
|
|
|
2024-08-10 09:44:11 -05:00
|
|
|
- [mha-implementations.ipynb](mha-implementations.ipynb) contains and compares different implementations of multi-head attention
|
|
|
|
|
2024-09-05 18:24:33 +02:00
|
|
|
|
|
|
|
|
|
|
|
### Summary
|
|
|
|
|
|
|
|
The figures below summarize the performance benchmarks (lower is better).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Forward pass only
|
|
|
|
|
|
|
|
<a href="mha-implementations.ipynb"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/mha-benchmark/1_forward-only.webp?1" width="500px"></a>
|
|
|
|
|
|
|
|
|
|
|
|
#### Forward and backward pass
|
|
|
|
|
|
|
|
<a href="mha-implementations.ipynb"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/mha-benchmark/2_forward-and-backward.webp?1" width="500px"></a>
|
|
|
|
|
|
|
|
|
|
|
|
#### Forward and backward pass after compilation
|
|
|
|
|
|
|
|
<a href="mha-implementations.ipynb"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/mha-benchmark/3_forward-and-backward-compiled.webp?1" width="500px"></a>
|
|
|
|
|