Sebastian Raschka
|
e742d8af2c
|
Improve MoE implementation (#841)
|
2025-09-22 15:21:06 -05:00 |
|
Sebastian Raschka
|
32965e0edd
|
remove redundant next_cache (#817)
|
2025-09-11 15:16:08 -05:00 |
|
Sebastian Raschka
|
c7a4362ca4
|
Add defensive context trimming for multiturn (#815)
* Add defensive context trimming for multiturn
* add all mods
|
2025-09-09 20:19:00 -05:00 |
|
Sebastian Raschka
|
f92b40e4ab
|
Qwen3 Coder Flash & MoE from Scratch (#760)
* Qwen3 Coder Flash & MoE from Scratch
* update
* refinements
* updates
* update
* update
* update
|
2025-08-01 19:13:17 -05:00 |
|
Sebastian Raschka
|
3c9dc4807b
|
Simplify KV cache usage (#728)
* Simplify KV cache usage
* Swap mark text with ghostwriter
|
2025-07-08 12:56:55 -05:00 |
|
Sebastian Raschka
|
c4ec55edac
|
Support different Qwen3 sizes in pkg (#714)
|
2025-06-28 08:00:23 -05:00 |
|
Sebastian Raschka
|
81eda38d3b
|
Improve KV cache code for torch.compile (#705)
* Improve KV cache code for torch.compile
* cleanup
* cleanup
|
2025-06-23 18:08:49 -05:00 |
|
Sebastian Raschka
|
0a2e8c39c4
|
Qwen3 KV cache (#688)
|
2025-06-21 17:34:39 -05:00 |
|
Sebastian Raschka
|
fdc3e1b701
|
Add GPT-2 KV cache to pkg (#687)
|
2025-06-21 12:29:04 -05:00 |
|
Sebastian Raschka
|
3be0f3202a
|
Llama 3 KV Cache (#685)
* Llama 3 KV Cache
* skip expensive tests on Gh actions
* Update __init__.py
|
2025-06-21 10:55:20 -05:00 |
|