Sebastian Raschka 9bb203b1b7 Einsum multi-head attention (#345)
* Einsum multi-head attention

* update diff
2024-09-05 18:24:33 +02:00

938 B

More Efficient Multi-Head Attention Implementations

Summary

The figures below summarize the performance benchmarks (lower is better).

 

Forward pass only

 

Forward and backward pass

 

Forward and backward pass after compilation