Sebastian Raschka
|
27fa95d24b
|
Fix qk_norm comment (#769)
|
2025-08-15 08:38:48 -05:00 |
|
Sebastian Raschka
|
07c3122b5c
|
Qwen3 and Llama3 equivalency teests with HF transformers (#768)
* Qwen3 and Llama3 equivalency teests with HF transformers
* update
|
2025-08-14 18:36:07 -05:00 |
|
Sebastian Raschka
|
71ef67be46
|
Qwen3 Coder Flash & MoE from Scratch (#760)
* Qwen3 Coder Flash & MoE from Scratch
* update
* refinements
* updates
* update
* update
* update
|
2025-08-01 19:13:17 -05:00 |
|
Sebastian Raschka
|
a200698698
|
Batched KV Cache Inference for Qwen3 (#735)
|
2025-07-10 08:09:35 -05:00 |
|
Sebastian Raschka
|
7dc1dcbe27
|
Qwen3 tokenizer sanity checks (#730)
|
2025-07-09 13:52:35 -05:00 |
|
Sebastian Raschka
|
d23b1f07b8
|
Add more sophisticated Qwen3 tokenizer (#729)
|
2025-07-09 13:16:26 -05:00 |
|
Sebastian Raschka
|
90c824506c
|
Simplify KV cache usage (#728)
* Simplify KV cache usage
* Swap mark text with ghostwriter
|
2025-07-08 12:56:55 -05:00 |
|
Sebastian Raschka
|
b5bd8d2de2
|
Update Qwen3 tokenizer test (#727)
* Update Qwen3 tokenizer test
* add tokenizers to dev dependencies
* add tokenizers to dev dependencies
|
2025-07-08 06:59:46 -05:00 |
|
Sebastian Raschka
|
30645a6d64
|
Handle other Qwen3 tokenizer settings (#716)
|
2025-06-30 17:49:51 -05:00 |
|
Sebastian Raschka
|
dc2f8e95d4
|
Support different Qwen3 sizes in pkg (#714)
|
2025-06-28 08:00:23 -05:00 |
|
Sebastian Raschka
|
58b30e2f7b
|
Improve KV cache code for torch.compile (#705)
* Improve KV cache code for torch.compile
* cleanup
* cleanup
|
2025-06-23 18:08:49 -05:00 |
|
Sebastian Raschka
|
e9ffdbace4
|
CPU compile performance for Qwen3 models (#704)
* Ch06 classifier function asserts
* Qwen3 cpu compilation perf
|
2025-06-23 11:06:10 -05:00 |
|
Sajjad Baloch
|
cfdf22330b
|
Fix: Typo in appendix_d.py comments. (#682)
* Fix: pkg/llms_from_scratch/appendix_d.py
* minor language typo fix
* fix 691
---------
Co-authored-by: PrinceSajjadHussain <PrinceSajjadHussain@users.noreply.github.com>
Co-authored-by: rasbt <mail@sebastianraschka.com>
|
2025-06-22 12:15:12 -05:00 |
|
Sebastian Raschka
|
0b15a00574
|
Qwen3 KV cache (#688)
|
2025-06-21 17:34:39 -05:00 |
|
Daniel Kleine
|
2a530b49fe
|
added pkg fixes (#676)
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
|
2025-06-21 16:07:50 -05:00 |
|
Sebastian Raschka
|
bb57756444
|
Add GPT-2 KV cache to pkg (#687)
|
2025-06-21 12:29:04 -05:00 |
|
Sebastian Raschka
|
9d62ca0598
|
Llama 3 KV Cache (#685)
* Llama 3 KV Cache
* skip expensive tests on Gh actions
* Update __init__.py
|
2025-06-21 10:55:20 -05:00 |
|
Sebastian Raschka
|
3d4bce6d57
|
Qwen3 From Scratch (#678)
* Qwen3 From Scratch
* rev other file
* upd
* upd
* upd
* url fixes
|
2025-06-19 18:44:38 -05:00 |
|
Daniel Kleine
|
479b0e2aa9
|
fixed gqa qkv code comments (#660)
|
2025-06-13 08:21:28 -05:00 |
|
Sebastian Raschka
|
a3c4c33347
|
Reduce Llama 3 RoPE memory requirements (#658)
* Llama3 from scratch improvements
* Fix Llama 3 expensive RoPE memory issue
* updates
* update package
* benchmark
* remove unused rescale_theta
|
2025-06-12 11:08:02 -05:00 |
|
Sebastian Raschka
|
2dc2df593a
|
Llama3Fast (#593)
* Llama3Fast
* Update pkg/llms_from_scratch/tests/test_llama3.py
|
2025-04-01 12:56:11 -05:00 |
|
Sebastian Raschka
|
4128a91c1d
|
Add Llama 3.2 to pkg (#591)
* Add Llama 3.2 to pkg
* remove redundant attributes
* update tests
* updates
* updates
* updates
* fix link
* fix link
|
2025-03-31 18:59:47 -05:00 |
|
Sebastian Raschka
|
e55e3e88e1
|
Alt weight loading code via PyTorch (#585)
* Alt weight loading code via PyTorch
* commit additional files
|
2025-03-27 20:10:23 -05:00 |
|
Sebastian Raschka
|
e07a7abdd5
|
Add GPTModelFast (#584)
* Add GPTModelFast
* update
|
2025-03-27 14:00:25 -05:00 |
|
Sebastian Raschka
|
cf6fb73553
|
Add readme (#577)
|
2025-03-23 19:35:12 -05:00 |
|
Sebastian Raschka
|
7114ccd10d
|
Add PyPI package (#576)
* Add PyPI package
* fixes
* fixes
|
2025-03-23 19:28:49 -05:00 |
|