casinca
58b8672452
removed old args in GQA class ( #674 )
2025-06-17 13:09:53 -05:00
Daniel Kleine
c2cfb47b1a
fixed gqa qkv code comments ( #660 )
2025-06-13 08:21:28 -05:00
Sebastian Raschka
c4cde1c21b
Reduce Llama 3 RoPE memory requirements ( #658 )
...
* Llama3 from scratch improvements
* Fix Llama 3 expensive RoPE memory issue
* updates
* update package
* benchmark
* remove unused rescale_theta
2025-06-12 11:08:02 -05:00
Sebastian Raschka
f1434652f2
reformat nbs ( #602 )
2025-04-05 16:18:27 -05:00
Sebastian Raschka
c21bfe4a23
Add PyPI package ( #576 )
...
* Add PyPI package
* fixes
* fixes
2025-03-23 19:28:49 -05:00
Sebastian Raschka
a08d7aaa84
Uv workflow improvements ( #531 )
...
* Uv workflow improvements
* Uv workflow improvements
* linter improvements
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* windows fixes
* windows fixes
* windows fixes
* windows fixes
* windows fixes
* windows fixes
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
2025-02-16 13:16:51 -06:00
casinca
bb31de8999
[minor] typo & comments ( #441 )
...
* typo & comment
- safe -> save
- commenting code: batch_size, seq_len = in_idx.shape
* comment
- adding # NEW for assert num_heads % num_kv_groups == 0
* update memory wording
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-11-18 19:52:42 +09:00
Daniel Kleine
e8c2f962e9
minor fixes: Llama 3.2 standalone ( #420 )
...
* minor fixes
* reformat rope base as float
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-10-25 21:08:06 -05:00
Sebastian Raschka
1516de54a5
RoPE theta rescaling ( #419 )
...
* rope fixes
* update
* update
* cleanup
2024-10-25 15:27:23 -05:00
Daniel Kleine
5ff72c2850
fixed typos ( #414 )
...
* fixed typos
* fixed formatting
* Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb
* del weights after load into model
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-10-24 18:23:53 -05:00
Daniel Kleine
d38083c401
Updated Llama 2 to 3 paths ( #413 )
...
* llama 2 and 3 path fixes
* updated llama 3, 3.1 and 3.2 paths
* updated .gitignore
* Typo fix
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-10-24 07:40:08 -05:00
Sebastian Raschka
7cd6a670ed
RoPE updates ( #412 )
...
* RoPE updates
* Apply suggestions from code review
* updates
* updates
* updates
2024-10-23 18:07:49 -05:00
Sebastian Raschka
534a704364
RoPE increase ( #407 )
2024-10-21 19:58:38 -05:00
Sebastian Raschka
1eb0b3810a
Introduce buffers to improve Llama 3.2 efficiency ( #389 )
...
* Introduce buffers to improve Llama 3.2 efficiency
* update
* update
2024-10-06 12:49:04 -05:00
Daniel Kleine
a0c0c765a8
fixed Llama 2 to 3.2 NBs ( #388 )
...
* updated requirements
* fixes llama2 to llama3
* fixed llama 3.2 standalone
* fixed typo
* fixed rope formula
* Update requirements-extra.txt
* Update ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb
* Update ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb
* Update ch05/07_gpt_to_llama/standalone-llama32.ipynb
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-10-06 09:56:55 -05:00
Sebastian Raschka
0972ded530
Add a note about weight tying in Llama 3.2 ( #386 )
2024-10-05 09:20:54 -05:00
Sebastian Raschka
b44096acef
Implement Llama 3.2 ( #383 )
2024-10-05 07:30:47 -05:00