Aviral Garg 27d52d6378
Fix MHAEinsum weight dimension bug when d_in != d_out (#857) (#893)
* Fix MHAEinsum weight dimension bug when d_in != d_out (#857)

Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully.

* use pytest

* Update .gitignore

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-10-31 21:45:31 -05:00
..
2025-04-02 09:47:07 -05:00

Chapter 3: Coding Attention Mechanisms

 

Main Chapter Code

 

Bonus Materials

In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.



Link to the video