LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-08-31 20:08:08 +00:00

Author	SHA1	Message	Date
Sebastian Raschka	dc2f8e95d4	Support different Qwen3 sizes in pkg (#714 )	2025-06-28 08:00:23 -05:00
Sebastian Raschka	8c8ff24118	Use test mode arg in ch07 (#713 )	2025-06-27 19:28:56 -05:00
Sebastian Raschka	9c666a4d94	Remove unused params for hparam script (#710 )	2025-06-25 12:50:32 -05:00
Sebastian Raschka	81be5fab0b	Add Qwen3 1.7, 4B, 8B, and 32B support to from-scratch nb (#709 )	2025-06-25 08:53:09 -05:00
Sebastian Raschka	1bce95d70c	Link the other KV cache sections (#708 )	2025-06-24 16:52:29 -05:00
Sebastian Raschka	cd03d5008a	Add link to free exercise PDF (#706 )	2025-06-24 08:24:02 -05:00
Sebastian Raschka	28b5d4e8a6	Update Llama 3 table for consistency with Qwen3	2025-06-23 18:33:04 -05:00
Sebastian Raschka	58b30e2f7b	Improve KV cache code for torch.compile (#705 ) * Improve KV cache code for torch.compile * cleanup * cleanup	2025-06-23 18:08:49 -05:00
Martin Ma	ad16b1fbee	Fix bug in masking when kv cache is used. (#697 ) * Fix bug in masking when kv cache is used. * add tests * dd tests * upd * add kv cache test to gh workflow * explicit mask slicing * upd --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-23 13:12:56 -05:00
Sebastian Raschka	e9ffdbace4	CPU compile performance for Qwen3 models (#704 ) * Ch06 classifier function asserts * Qwen3 cpu compilation perf	2025-06-23 11:06:10 -05:00
Sebastian Raschka	d7c7393af7	Ch06 classifier function asserts (#703 )	2025-06-23 08:21:55 -05:00
Shamik	f051a5fe6b	Update README.md (#702 ) Typo in kv cache readme	2025-06-23 07:21:51 -05:00
Matthew Hernandez	f3fadd6c0a	Fix issue #684 : Minor docstring edit (#699 )	2025-06-23 07:18:28 -05:00
Sebastian Raschka	984cca3f64	Fix code comment: embed_dim -> d_out (#698 )	2025-06-22 16:36:39 -05:00
Sebastian Raschka	a5ea296259	Use more recent sentencepiece tokenizer API (#696 )	2025-06-22 13:52:30 -05:00
Sebastian Raschka	2351a1f282	Fix some wording issues in the notes (#695 )	2025-06-22 13:46:16 -05:00
Sajjad Baloch	cfdf22330b	Fix: Typo in `appendix_d.py` comments. (#682 ) * Fix: pkg/llms_from_scratch/appendix_d.py * minor language typo fix * fix 691 --------- Co-authored-by: PrinceSajjadHussain <PrinceSajjadHussain@users.noreply.github.com> Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-22 12:15:12 -05:00
casinca	c4b19d7eb6	fix issue #664 - inverted token and pos emb layers (#665 ) * fix inverted token and pos layers * remove redundant code --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-22 12:15:01 -05:00
Sebastian Raschka	0b15a00574	Qwen3 KV cache (#688 )	2025-06-21 17:34:39 -05:00
Daniel Kleine	2a530b49fe	added pkg fixes (#676 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-21 16:07:50 -05:00
Sebastian Raschka	bb57756444	Add GPT-2 KV cache to pkg (#687 )	2025-06-21 12:29:04 -05:00
Sebastian Raschka	9d62ca0598	Llama 3 KV Cache (#685 ) * Llama 3 KV Cache * skip expensive tests on Gh actions * Update __init__.py	2025-06-21 10:55:20 -05:00
Sebastian Raschka	9f6f514191	Fix formatting in Qwen3 nb (#680 ) * Fix formatting in Qwen3 nb * upd	2025-06-20 07:28:27 -05:00
casinca	374cb94af0	minor readability improvements (#668 )	2025-06-19 18:56:49 -05:00
Daniel Kleine	e79bb50a1b	fixed plot_losses (#677 )	2025-06-19 18:55:43 -05:00
Sebastian Raschka	3d4bce6d57	Qwen3 From Scratch (#678 ) * Qwen3 From Scratch * rev other file * upd * upd * upd * url fixes	2025-06-19 18:44:38 -05:00
casinca	e700c66b7a	removed old args in GQA class (#674 )	2025-06-17 13:09:53 -05:00
Sebastian Raschka	c488578cae	Optimize KV cache (#673 ) * Optimize KV cache * style * interpretable generate * interpretable generate * update readme	2025-06-16 16:00:50 -05:00
Sebastian Raschka	e704b5fa50	Optimized KV cache (#672 ) * Optimized KV cache * typo fix	2025-06-15 14:26:16 -05:00
Sebastian Raschka	9aed6f5a76	Add KV cache (#671 )	2025-06-15 09:58:08 -05:00
Sebastian Raschka	d9dd94dac6	Remove redundant model = (#663 )	2025-06-13 11:03:55 -05:00
Sebastian Raschka	a0f5326a25	Update pixi (#661 ) * Llama3 from scratch improvements * Update HF hub version in pixi.toml * Update README.md	2025-06-13 10:50:17 -05:00
Daniel Kleine	479b0e2aa9	fixed gqa qkv code comments (#660 )	2025-06-13 08:21:28 -05:00
Greg Gandenberger	2af3cf070c	Update ch07.ipynb (#643 ) Correct function name	2025-06-13 08:17:10 -05:00
Shimpei Kojio	1446cfd824	fixed video link (#646 )	2025-06-13 08:16:18 -05:00
Pratyush Subhadarshi	d142741ef4	Correcting the wrong reference (#649 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-12 16:35:51 -05:00
Sebastian Raschka	a3c4c33347	Reduce Llama 3 RoPE memory requirements (#658 ) * Llama3 from scratch improvements * Fix Llama 3 expensive RoPE memory issue * updates * update package * benchmark * remove unused rescale_theta	2025-06-12 11:08:02 -05:00
Sebastian Raschka	55e2a0978a	DeBERTa-v3 baseline (#630 ) * Llama3 from scratch improvements * deberta-baseline * restore	2025-04-19 21:16:17 -05:00
Sebastian Raschka	02ca4ac42d	BPE cosmetics (#629 ) * Llama3 from scratch improvements * Cosmetic BPE improvements * restore * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * endoftext whitespace	2025-04-18 18:57:09 -05:00
Sebastian Raschka	ec062e1099	Dpo vocab size clarification (#628 ) * Llama3 from scratch improvements * vocab size should be 50257 not 50256 * restore	2025-04-18 17:20:56 -05:00
Sebastian Raschka	3eca919a52	Llama3 from scratch improvements (#621 ) * Llama3 from scratch improvements * restore	2025-04-16 18:08:26 -05:00
casinca	1cbdcd86c3	Minor DPO fixes (#617 ) * minor dpo fixes * Update dpo-from-scratch.ipynb metadata diff	2025-04-16 12:56:49 -05:00
Daniel Kleine	6ec8fb3dfe	fixed `<\|endoftext\|>` token (#620 )	2025-04-16 12:15:59 -05:00
Henry Shi	88250d953d	updated exercise 5.3 (#615 ) * updated exercise 5.3 temperature can be set to 0 to regardless of top_k setting to force deterministic behavior * fix notebook json --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-04-13 13:06:57 -05:00
Sebastian Raschka	48e98abc8e	add special token handling to bpe from scratch code (#616 )	2025-04-13 12:38:22 -05:00
Sebastian Raschka	d5eaa36416	Ch06 and Ch07 videos (#613 ) * Ch06 and Ch07 videos * exclude google scholar from link checking	2025-04-12 14:51:02 -05:00
PRASHANTH REDDY NIMMAKAYALA	cff41c1fbc	fix: typo in ch07.ipynb (#612 )	2025-04-12 10:29:53 -05:00
Sebastian Raschka	b662ec9ada	Improve ModernBERT comments (#606 ) * Improve modernbert comments * bash code formatting	2025-04-06 18:29:22 -05:00
Sebastian Raschka	9e08fff657	align formulas in notes with code (#605 )	2025-04-06 16:46:53 -05:00
Sebastian Raschka	97a199e40b	Disable mask saving as weight in Llama 3 model (#604 ) * Disable mask saving as weight * update pixi * update pixi	2025-04-06 09:33:36 -05:00

1 2 3 4 5 ...

895 Commits