LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-08-31 03:50:23 +00:00

Author	SHA1	Message	Date
rasbt	8fd29ed079	Gemma 3 270M from scratch	2025-08-16 19:49:38 -05:00
Sebastian Raschka	27fa95d24b	Fix qk_norm comment (#769 )	2025-08-15 08:38:48 -05:00
Sebastian Raschka	2e3205f747	MoE Nb readability improvements (#761 )	2025-08-01 19:58:18 -05:00
Sebastian Raschka	71ef67be46	Qwen3 Coder Flash & MoE from Scratch (#760 ) * Qwen3 Coder Flash & MoE from Scratch * update * refinements * updates * update * update * update	2025-08-01 19:13:17 -05:00
casinca	d6213a398a	[Minor] Qwen3 typo & optim (#758 ) * typo * remove weight dict after loading	2025-07-28 17:29:44 -05:00
Sebastian Raschka	19c065b342	Interleaved Q and K for RoPE in Llama 2 (#750 )	2025-07-23 08:02:02 -05:00
Sebastian Raschka	b74ab9611e	Minor typo: pply -> Apply (#749 )	2025-07-22 08:19:25 -05:00
Sebastian Raschka	38591b0049	get rid of redundant memory profiler import (#744 )	2025-07-16 07:36:51 -05:00
Sebastian Raschka	a200698698	Batched KV Cache Inference for Qwen3 (#735 )	2025-07-10 08:09:35 -05:00
Sebastian Raschka	d23b1f07b8	Add more sophisticated Qwen3 tokenizer (#729 )	2025-07-09 13:16:26 -05:00
Matthew Hernandez	80c1bb2cf4	Fix issue 724: unused args (#726 ) * Fix issue 724: unused args * Update 02_opt_multi_gpu_ddp.py	2025-07-08 06:37:39 -05:00
Sebastian Raschka	dc2f8e95d4	Support different Qwen3 sizes in pkg (#714 )	2025-06-28 08:00:23 -05:00
Sebastian Raschka	9c666a4d94	Remove unused params for hparam script (#710 )	2025-06-25 12:50:32 -05:00
Sebastian Raschka	81be5fab0b	Add Qwen3 1.7, 4B, 8B, and 32B support to from-scratch nb (#709 )	2025-06-25 08:53:09 -05:00
Sebastian Raschka	28b5d4e8a6	Update Llama 3 table for consistency with Qwen3	2025-06-23 18:33:04 -05:00
Sebastian Raschka	58b30e2f7b	Improve KV cache code for torch.compile (#705 ) * Improve KV cache code for torch.compile * cleanup * cleanup	2025-06-23 18:08:49 -05:00
Sebastian Raschka	e9ffdbace4	CPU compile performance for Qwen3 models (#704 ) * Ch06 classifier function asserts * Qwen3 cpu compilation perf	2025-06-23 11:06:10 -05:00
Sebastian Raschka	984cca3f64	Fix code comment: embed_dim -> d_out (#698 )	2025-06-22 16:36:39 -05:00
Sebastian Raschka	a5ea296259	Use more recent sentencepiece tokenizer API (#696 )	2025-06-22 13:52:30 -05:00
Sebastian Raschka	2351a1f282	Fix some wording issues in the notes (#695 )	2025-06-22 13:46:16 -05:00
Sebastian Raschka	0b15a00574	Qwen3 KV cache (#688 )	2025-06-21 17:34:39 -05:00
Sebastian Raschka	9d62ca0598	Llama 3 KV Cache (#685 ) * Llama 3 KV Cache * skip expensive tests on Gh actions * Update __init__.py	2025-06-21 10:55:20 -05:00
Sebastian Raschka	9f6f514191	Fix formatting in Qwen3 nb (#680 ) * Fix formatting in Qwen3 nb * upd	2025-06-20 07:28:27 -05:00
Daniel Kleine	e79bb50a1b	fixed plot_losses (#677 )	2025-06-19 18:55:43 -05:00
Sebastian Raschka	3d4bce6d57	Qwen3 From Scratch (#678 ) * Qwen3 From Scratch * rev other file * upd * upd * upd * url fixes	2025-06-19 18:44:38 -05:00
casinca	e700c66b7a	removed old args in GQA class (#674 )	2025-06-17 13:09:53 -05:00
Daniel Kleine	479b0e2aa9	fixed gqa qkv code comments (#660 )	2025-06-13 08:21:28 -05:00
Pratyush Subhadarshi	d142741ef4	Correcting the wrong reference (#649 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-06-12 16:35:51 -05:00
Sebastian Raschka	a3c4c33347	Reduce Llama 3 RoPE memory requirements (#658 ) * Llama3 from scratch improvements * Fix Llama 3 expensive RoPE memory issue * updates * update package * benchmark * remove unused rescale_theta	2025-06-12 11:08:02 -05:00
Sebastian Raschka	3eca919a52	Llama3 from scratch improvements (#621 ) * Llama3 from scratch improvements * restore	2025-04-16 18:08:26 -05:00
Henry Shi	88250d953d	updated exercise 5.3 (#615 ) * updated exercise 5.3 temperature can be set to 0 to regardless of top_k setting to force deterministic behavior * fix notebook json --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-04-13 13:06:57 -05:00
Sebastian Raschka	97a199e40b	Disable mask saving as weight in Llama 3 model (#604 ) * Disable mask saving as weight * update pixi * update pixi	2025-04-06 09:33:36 -05:00
Sebastian Raschka	c43d7ef663	reformat nbs (#602 )	2025-04-05 16:18:27 -05:00
Sebastian Raschka	396e96ab07	Fix Llama language typo in bonus materials (#597 )	2025-04-02 21:41:36 -05:00
Sebastian Raschka	4128a91c1d	Add Llama 3.2 to pkg (#591 ) * Add Llama 3.2 to pkg * remove redundant attributes * update tests * updates * updates * updates * fix link * fix link	2025-03-31 18:59:47 -05:00
casinca	d7c316533a	removing unused RoPE parameters (#590 ) * removing unused RoPE parameters * remove redundant context_length in GQA --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2025-03-31 17:10:39 -05:00
Sebastian Raschka	4e3b752e5e	Memory optimized Llama (#588 ) * Memory optimized Llama * re-ad login	2025-03-30 15:18:12 -05:00
Sebastian Raschka	e55e3e88e1	Alt weight loading code via PyTorch (#585 ) * Alt weight loading code via PyTorch * commit additional files	2025-03-27 20:10:23 -05:00
Sebastian Raschka	c9271ac427	Adjust comment to save compiled model (#583 )	2025-03-27 10:43:45 -05:00
Sebastian Raschka	857acfcc12	Vocab padding clarification (#582 ) * vocab padding clarification * Update ch05/10_llm-training-speed/README.md	2025-03-26 13:19:55 -05:00
Sebastian Raschka	fee7d4bb05	More explicit torchrun usage doc (#578 )	2025-03-24 12:01:03 -05:00
Sebastian Raschka	cf6fb73553	Add readme (#577 )	2025-03-23 19:35:12 -05:00
Sebastian Raschka	7114ccd10d	Add PyPI package (#576 ) * Add PyPI package * fixes * fixes	2025-03-23 19:28:49 -05:00
Sebastian Raschka	85f2bc0a58	Speed comparison figure (#575 )	2025-03-21 11:29:49 -05:00
Greg Gandenberger	1ec5631c70	Fix minor printing issue and note inconsistency across platforms (#563 ) * Fix printing issue and note inconsistency * Rerun notebook	2025-03-14 15:12:09 -05:00
Sebastian Raschka	4fb0ea9d1f	Specify UTF-8 encoding in the json load command explicitely (#557 )	2025-03-05 11:46:21 -06:00
Sebastian Raschka	de60da9a6b	Add a note about "zsh: illegal hardware instruction python" error (#555 )	2025-03-02 15:18:24 -06:00
Sebastian Raschka	fa5760a8de	GitHub markdown updates (#545 ) * GitHub markdown updates * Apply suggestions from code review * Apply suggestions from code review	2025-02-23 12:25:44 -06:00
Sebastian Raschka	5016499d1d	Uv workflow improvements (#531 ) * Uv workflow improvements * Uv workflow improvements * linter improvements * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * windows fixes * windows fixes * windows fixes * windows fixes * windows fixes * windows fixes * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix	2025-02-16 13:16:51 -06:00
Sebastian Raschka	e818be42e1	Update link to vocab size increase (#526 ) * Update link to vocab size increase * Update ch05/10_llm-training-speed/README.md * Update ch05/10_llm-training-speed/README.md	2025-02-14 08:03:01 -06:00

1 2 3 4 5

203 Commits