LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-11-30 17:11:36 +00:00

Author	SHA1	Message	Date
Sebastian Raschka	984cca3f64	Fix code comment: embed_dim -> d_out (#698 )	2025-06-22 16:36:39 -05:00
Sebastian Raschka	a5ea296259	Use more recent sentencepiece tokenizer API (#696 )	2025-06-22 13:52:30 -05:00
Sebastian Raschka	2351a1f282	Fix some wording issues in the notes (#695 )	2025-06-22 13:46:16 -05:00
Sajjad Baloch	cfdf22330b	Fix: Typo in `appendix_d.py` comments. (#682 ) * Fix: pkg/llms_from_scratch/appendix_d.py * minor language typo fix * fix 691 --------- Co-authored-by: PrinceSajjadHussain <PrinceSajjadHussain@users.noreply.github.com> Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-22 12:15:12 -05:00
casinca	c4b19d7eb6	fix issue #664 - inverted token and pos emb layers (#665 ) * fix inverted token and pos layers * remove redundant code --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-22 12:15:01 -05:00
Sebastian Raschka	0b15a00574	Qwen3 KV cache (#688 )	2025-06-21 17:34:39 -05:00
Daniel Kleine	2a530b49fe	added pkg fixes (#676 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-21 16:07:50 -05:00
Sebastian Raschka	bb57756444	Add GPT-2 KV cache to pkg (#687 )	2025-06-21 12:29:04 -05:00
Sebastian Raschka	9d62ca0598	Llama 3 KV Cache (#685 ) * Llama 3 KV Cache * skip expensive tests on Gh actions * Update __init__.py	2025-06-21 10:55:20 -05:00
Sebastian Raschka	9f6f514191	Fix formatting in Qwen3 nb (#680 ) * Fix formatting in Qwen3 nb * upd	2025-06-20 07:28:27 -05:00
casinca	374cb94af0	minor readability improvements (#668 )	2025-06-19 18:56:49 -05:00
Daniel Kleine	e79bb50a1b	fixed plot_losses (#677 )	2025-06-19 18:55:43 -05:00
Sebastian Raschka	3d4bce6d57	Qwen3 From Scratch (#678 ) * Qwen3 From Scratch * rev other file * upd * upd * upd * url fixes	2025-06-19 18:44:38 -05:00
casinca	e700c66b7a	removed old args in GQA class (#674 )	2025-06-17 13:09:53 -05:00
Sebastian Raschka	c488578cae	Optimize KV cache (#673 ) * Optimize KV cache * style * interpretable generate * interpretable generate * update readme	2025-06-16 16:00:50 -05:00
Sebastian Raschka	e704b5fa50	Optimized KV cache (#672 ) * Optimized KV cache * typo fix	2025-06-15 14:26:16 -05:00
Sebastian Raschka	9aed6f5a76	Add KV cache (#671 )	2025-06-15 09:58:08 -05:00
Sebastian Raschka	d9dd94dac6	Remove redundant model = (#663 )	2025-06-13 11:03:55 -05:00
Sebastian Raschka	a0f5326a25	Update pixi (#661 ) * Llama3 from scratch improvements * Update HF hub version in pixi.toml * Update README.md	2025-06-13 10:50:17 -05:00
Daniel Kleine	479b0e2aa9	fixed gqa qkv code comments (#660 )	2025-06-13 08:21:28 -05:00
Greg Gandenberger	2af3cf070c	Update ch07.ipynb (#643 ) Correct function name	2025-06-13 08:17:10 -05:00
Shimpei Kojio	1446cfd824	fixed video link (#646 )	2025-06-13 08:16:18 -05:00
Pratyush Subhadarshi	d142741ef4	Correcting the wrong reference (#649 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-12 16:35:51 -05:00
Sebastian Raschka	a3c4c33347	Reduce Llama 3 RoPE memory requirements (#658 ) * Llama3 from scratch improvements * Fix Llama 3 expensive RoPE memory issue * updates * update package * benchmark * remove unused rescale_theta	2025-06-12 11:08:02 -05:00
Sebastian Raschka	55e2a0978a	DeBERTa-v3 baseline (#630 ) * Llama3 from scratch improvements * deberta-baseline * restore	2025-04-19 21:16:17 -05:00
Sebastian Raschka	02ca4ac42d	BPE cosmetics (#629 ) * Llama3 from scratch improvements * Cosmetic BPE improvements * restore * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * endoftext whitespace	2025-04-18 18:57:09 -05:00
Sebastian Raschka	ec062e1099	Dpo vocab size clarification (#628 ) * Llama3 from scratch improvements * vocab size should be 50257 not 50256 * restore	2025-04-18 17:20:56 -05:00
Sebastian Raschka	3eca919a52	Llama3 from scratch improvements (#621 ) * Llama3 from scratch improvements * restore	2025-04-16 18:08:26 -05:00
casinca	1cbdcd86c3	Minor DPO fixes (#617 ) * minor dpo fixes * Update dpo-from-scratch.ipynb metadata diff	2025-04-16 12:56:49 -05:00
Daniel Kleine	6ec8fb3dfe	fixed `<\|endoftext\|>` token (#620 )	2025-04-16 12:15:59 -05:00
Henry Shi	88250d953d	updated exercise 5.3 (#615 ) * updated exercise 5.3 temperature can be set to 0 to regardless of top_k setting to force deterministic behavior * fix notebook json --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-04-13 13:06:57 -05:00
Sebastian Raschka	48e98abc8e	add special token handling to bpe from scratch code (#616 )	2025-04-13 12:38:22 -05:00
Sebastian Raschka	d5eaa36416	Ch06 and Ch07 videos (#613 ) * Ch06 and Ch07 videos * exclude google scholar from link checking	2025-04-12 14:51:02 -05:00
PRASHANTH REDDY NIMMAKAYALA	cff41c1fbc	fix: typo in ch07.ipynb (#612 )	2025-04-12 10:29:53 -05:00
Sebastian Raschka	b662ec9ada	Improve ModernBERT comments (#606 ) * Improve modernbert comments * bash code formatting	2025-04-06 18:29:22 -05:00
Sebastian Raschka	9e08fff657	align formulas in notes with code (#605 )	2025-04-06 16:46:53 -05:00
Sebastian Raschka	97a199e40b	Disable mask saving as weight in Llama 3 model (#604 ) * Disable mask saving as weight * update pixi * update pixi	2025-04-06 09:33:36 -05:00
Sebastian Raschka	c43d7ef663	reformat nbs (#602 )	2025-04-05 16:18:27 -05:00
Sebastian Raschka	ab17357474	Correct BERT experiments (#600 )	2025-04-05 10:05:15 -05:00
Sebastian Raschka	14f976e024	Add ModernBERT (#598 )	2025-04-05 09:13:30 -05:00
Sebastian Raschka	396e96ab07	Fix Llama language typo in bonus materials (#597 )	2025-04-02 21:41:36 -05:00
Sebastian Raschka	f61baf86f2	Fix link (#596 )	2025-04-02 09:47:07 -05:00
Sebastian Raschka	2dc2df593a	Llama3Fast (#593 ) * Llama3Fast * Update pkg/llms_from_scratch/tests/test_llama3.py	2025-04-01 12:56:11 -05:00
Sebastian Raschka	4128a91c1d	Add Llama 3.2 to pkg (#591 ) * Add Llama 3.2 to pkg * remove redundant attributes * update tests * updates * updates * updates * fix link * fix link	2025-03-31 18:59:47 -05:00
casinca	d7c316533a	removing unused RoPE parameters (#590 ) * removing unused RoPE parameters * remove redundant context_length in GQA --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-03-31 17:10:39 -05:00
Sebastian Raschka	d75f74bd0c	Fix data download if UCI is temporarily down (#592 )	2025-03-31 16:25:53 -05:00
Sebastian Raschka	0bdcce4e40	Clarify dataset length in chapter 2 (#589 )	2025-03-30 16:01:37 -05:00
Sebastian Raschka	4e3b752e5e	Memory optimized Llama (#588 ) * Memory optimized Llama * re-ad login	2025-03-30 15:18:12 -05:00
Sebastian Raschka	e55e3e88e1	Alt weight loading code via PyTorch (#585 ) * Alt weight loading code via PyTorch * commit additional files	2025-03-27 20:10:23 -05:00
Sebastian Raschka	e07a7abdd5	Add GPTModelFast (#584 ) * Add GPTModelFast * update	2025-03-27 14:00:25 -05:00

1 2 3 4 5 ...

882 Commits