LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-10-18 11:29:29 +00:00

Author	SHA1	Message	Date
rasbt	713a6e24c9	add tests	2025-06-22 17:48:23 -05:00
martinzwm	ffc5e4e5d6	Fix bug in masking when kv cache is used.	2025-06-22 14:06:00 -07:00
Sebastian Raschka	01be5a42e4	Use more recent sentencepiece tokenizer API (#696 )	2025-06-22 13:52:30 -05:00
Sebastian Raschka	bcfdbd7008	Fix some wording issues in the notes (#695 )	2025-06-22 13:46:16 -05:00
Sajjad Baloch	661a6e84ee	Fix: Typo in `appendix_d.py` comments. (#682 ) * Fix: pkg/llms_from_scratch/appendix_d.py * minor language typo fix * fix 691 --------- Co-authored-by: PrinceSajjadHussain <PrinceSajjadHussain@users.noreply.github.com> Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-22 12:15:12 -05:00
casinca	564e986496	fix issue #664 - inverted token and pos emb layers (#665 ) * fix inverted token and pos layers * remove redundant code --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-06-22 12:15:01 -05:00
Sebastian Raschka	0a2e8c39c4	Qwen3 KV cache (#688 )	2025-06-21 17:34:39 -05:00
Daniel Kleine	14c054d36c	added pkg fixes (#676 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-21 16:07:50 -05:00
Sebastian Raschka	fdc3e1b701	Add GPT-2 KV cache to pkg (#687 )	2025-06-21 12:29:04 -05:00
Sebastian Raschka	3be0f3202a	Llama 3 KV Cache (#685 ) * Llama 3 KV Cache * skip expensive tests on Gh actions * Update __init__.py	2025-06-21 10:55:20 -05:00
Sebastian Raschka	c008f95072	Fix formatting in Qwen3 nb (#680 ) * Fix formatting in Qwen3 nb * upd	2025-06-20 07:28:27 -05:00
casinca	00b8c0a107	minor readability improvements (#668 )	2025-06-19 18:56:49 -05:00
Daniel Kleine	15fa6a84f6	fixed plot_losses (#677 )	2025-06-19 18:55:43 -05:00
Sebastian Raschka	e719bd86ad	Qwen3 From Scratch (#678 ) * Qwen3 From Scratch * rev other file * upd * upd * upd * url fixes	2025-06-19 18:44:38 -05:00
casinca	58b8672452	removed old args in GQA class (#674 )	2025-06-17 13:09:53 -05:00
Sebastian Raschka	ece59ba587	Optimize KV cache (#673 ) * Optimize KV cache * style * interpretable generate * interpretable generate * update readme	2025-06-16 16:00:50 -05:00
Sebastian Raschka	ba0370abd1	Optimized KV cache (#672 ) * Optimized KV cache * typo fix	2025-06-15 14:26:16 -05:00
Sebastian Raschka	2af686d70b	Add KV cache (#671 )	2025-06-15 09:58:08 -05:00
Sebastian Raschka	78bbcb3643	Remove redundant model = (#663 )	2025-06-13 11:03:55 -05:00
Sebastian Raschka	3dfd7e5f06	Update pixi (#661 ) * Llama3 from scratch improvements * Update HF hub version in pixi.toml * Update README.md	2025-06-13 10:50:17 -05:00
Daniel Kleine	c2cfb47b1a	fixed gqa qkv code comments (#660 )	2025-06-13 08:21:28 -05:00
Greg Gandenberger	7632eb018b	Update ch07.ipynb (#643 ) Correct function name	2025-06-13 08:17:10 -05:00
Shimpei Kojio	baaa6c9283	fixed video link (#646 )	2025-06-13 08:16:18 -05:00
Pratyush Subhadarshi	d56417c34c	Correcting the wrong reference (#649 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-12 16:35:51 -05:00
Sebastian Raschka	c4cde1c21b	Reduce Llama 3 RoPE memory requirements (#658 ) * Llama3 from scratch improvements * Fix Llama 3 expensive RoPE memory issue * updates * update package * benchmark * remove unused rescale_theta	2025-06-12 11:08:02 -05:00
Sebastian Raschka	c278745aff	DeBERTa-v3 baseline (#630 ) * Llama3 from scratch improvements * deberta-baseline * restore	2025-04-19 21:16:17 -05:00
Sebastian Raschka	4ff743051e	BPE cosmetics (#629 ) * Llama3 from scratch improvements * Cosmetic BPE improvements * restore * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * Update ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb * endoftext whitespace	2025-04-18 18:57:09 -05:00
Sebastian Raschka	adaf4faaae	Dpo vocab size clarification (#628 ) * Llama3 from scratch improvements * vocab size should be 50257 not 50256 * restore	2025-04-18 17:20:56 -05:00
Sebastian Raschka	47c036058d	Llama3 from scratch improvements (#621 ) * Llama3 from scratch improvements * restore	2025-04-16 18:08:26 -05:00
casinca	1b242d01a5	Minor DPO fixes (#617 ) * minor dpo fixes * Update dpo-from-scratch.ipynb metadata diff	2025-04-16 12:56:49 -05:00
Daniel Kleine	f3d1566c2e	fixed `<\|endoftext\|>` token (#620 )	2025-04-16 12:15:59 -05:00
Henry Shi	02779f5e35	updated exercise 5.3 (#615 ) * updated exercise 5.3 temperature can be set to 0 to regardless of top_k setting to force deterministic behavior * fix notebook json --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-04-13 13:06:57 -05:00
Sebastian Raschka	72efebd7f8	add special token handling to bpe from scratch code (#616 )	2025-04-13 12:38:22 -05:00
Sebastian Raschka	92b308e512	Ch06 and Ch07 videos (#613 ) * Ch06 and Ch07 videos * exclude google scholar from link checking	2025-04-12 14:51:02 -05:00
PRASHANTH REDDY NIMMAKAYALA	74b9f1fcde	fix: typo in ch07.ipynb (#612 )	2025-04-12 10:29:53 -05:00
Sebastian Raschka	9df572fdf4	Improve ModernBERT comments (#606 ) * Improve modernbert comments * bash code formatting	2025-04-06 18:29:22 -05:00
Sebastian Raschka	3654571184	align formulas in notes with code (#605 )	2025-04-06 16:46:53 -05:00
Sebastian Raschka	67e0680210	Disable mask saving as weight in Llama 3 model (#604 ) * Disable mask saving as weight * update pixi * update pixi	2025-04-06 09:33:36 -05:00
Sebastian Raschka	f1434652f2	reformat nbs (#602 )	2025-04-05 16:18:27 -05:00
Sebastian Raschka	371ab9e8ff	Correct BERT experiments (#600 )	2025-04-05 10:05:15 -05:00
Sebastian Raschka	4a9654173c	Add ModernBERT (#598 )	2025-04-05 09:13:30 -05:00
Sebastian Raschka	d4c8d8f2c9	Fix Llama language typo in bonus materials (#597 )	2025-04-02 21:41:36 -05:00
Sebastian Raschka	49330d0990	Fix link (#596 )	2025-04-02 09:47:07 -05:00
Sebastian Raschka	43e25a5165	Llama3Fast (#593 ) * Llama3Fast * Update pkg/llms_from_scratch/tests/test_llama3.py	2025-04-01 12:56:11 -05:00
Sebastian Raschka	aedad7efc3	Add Llama 3.2 to pkg (#591 ) * Add Llama 3.2 to pkg * remove redundant attributes * update tests * updates * updates * updates * fix link * fix link	2025-03-31 18:59:47 -05:00
casinca	152a087a37	removing unused RoPE parameters (#590 ) * removing unused RoPE parameters * remove redundant context_length in GQA --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-03-31 17:10:39 -05:00
Sebastian Raschka	222803737d	Fix data download if UCI is temporarily down (#592 )	2025-03-31 16:25:53 -05:00
Sebastian Raschka	6ea4dd3ae7	Clarify dataset length in chapter 2 (#589 )	2025-03-30 16:01:37 -05:00
Sebastian Raschka	0f6894f41e	Memory optimized Llama (#588 ) * Memory optimized Llama * re-ad login	2025-03-30 15:18:12 -05:00
Sebastian Raschka	3f93d73d6d	Alt weight loading code via PyTorch (#585 ) * Alt weight loading code via PyTorch * commit additional files	2025-03-27 20:10:23 -05:00

1 2 3 4 5 ...

883 Commits