LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-11-29 00:21:54 +00:00

Author	SHA1	Message	Date
Sebastian Raschka	218221ab62	Mixture-of-Experts intro (#888 )	2025-10-19 22:17:59 -05:00
Sebastian Raschka	27b6dfab9e	Make it easier to toggle between thinking and instruct variants (#887 )	2025-10-16 20:37:31 -05:00
Sebastian Raschka	7fe4874dda	Update the compression rate comment in MLA (#883 ) * compression comment * update	2025-10-14 11:10:06 -05:00
Sebastian Raschka	b969b3ef7a	Use figure numbers in ch05-7 (#881 )	2025-10-13 16:26:35 -05:00
Sebastian Raschka	bf039ff3dc	Add alternative attention structure (#880 )	2025-10-13 14:31:13 -05:00
Sebastian Raschka	6eb6adfa33	sliding window attention (#879 )	2025-10-12 22:13:20 -05:00
Sebastian Raschka	21f0617ea3	Add other appendices for completeness (#878 ) * Add other appendices for completeness * update * update * Update	2025-10-12 19:04:53 -05:00
rasbt	44eda5340a	rm plot	2025-10-12 08:55:03 -05:00
Sebastian Raschka	9b9586688d	Multi-Head Latent Attention (#876 ) * Multi-Head Latent Attention * update	2025-10-11 20:08:30 -05:00
Sebastian Raschka	bf27ad1485	Use GB instead of GiB consistently (#875 )	2025-10-11 09:11:33 -05:00
Sebastian Raschka	c814814d72	Grouped-Query Attention memory (#874 ) * GQA memory * remove redundant code * update links * update	2025-10-11 08:44:19 -05:00
rasbt	b8e12e1dd1	Use inference_device	2025-10-09 10:59:17 -05:00
Sebastian Raschka	fecfdd16ff	Add simpler BPE, and make previous BPE better (#870 ) * Add simpler BPE, and make previous BPE better * update * Update README.md	2025-10-08 22:22:34 -05:00
Sebastian Raschka	1164cb3e8f	Qwen3 and evaluation bonus materials (#869 )	2025-10-08 18:22:19 -05:00
Sebastian Raschka	7bd263144e	Switch from urllib to requests to improve reliability (#867 ) * Switch from urllib to requests to improve reliability * Keep ruff linter-specific * update * update * update	2025-10-07 15:22:59 -05:00
Sebastian Raschka	8552565bda	Add missing comma in imports in README (#865 )	2025-10-06 16:03:04 -05:00
Sebastian Raschka	7084123d10	Note about output dimensions (#862 )	2025-10-01 10:47:04 -05:00
Sebastian Raschka	4d9f9dcb6c	Update ollama address (#861 )	2025-09-30 21:05:53 -05:00
casinca	00c240ff87	some typo fixes (#858 ) * fix(typo): correct scaling * fix(typo): correct comment for `instruct`	2025-09-30 11:18:02 -05:00
Sebastian Raschka	458f2d9b67	Test dependencies with Python 3.13 (#843 ) * Custom python 3.13 entry in pyproject.toml * amend * update * update * update * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * update	2025-09-27 08:38:07 -05:00
Sebastian Raschka	47867bc1cb	Update generate script (#847 ) * Custom python 3.13 entry in pyproject.toml * amend * Update generate script * update * Update pyproject.toml	2025-09-27 08:03:54 -05:00
Sebastian Raschka	9bc827ea7e	Numerically stable generate on mps (#849 ) * Numerically stable generate on mps * add file	2025-09-26 22:42:44 -05:00
Sebastian Raschka	f492c949d3	Requirements update (#851 ) * Requirements update * Code change to tricker workers * update	2025-09-26 22:19:57 -05:00
Sebastian Raschka	322000d833	Windows compile (#845 ) * Custom python 3.13 entry in pyproject.toml * amend * Note about compile on Windows * update	2025-09-26 12:01:19 -05:00
Sebastian Raschka	3b83705988	Update package dependencies (#842 )	2025-09-22 18:32:39 -05:00
Sebastian Raschka	e742d8af2c	Improve MoE implementation (#841 )	2025-09-22 15:21:06 -05:00
Sebastian Raschka	20041fb94b	Note about devcontainer root usage (#833 )	2025-09-21 11:12:44 -05:00
Sebastian Raschka	2aa8e8130d	Note about RoPE usage (#839 ) * Note about devcontainer root usage * Add note about RoPE implementation	2025-09-20 16:25:58 +00:00
casinca	42c130623b	`Qwen3Tokenizer` fix for Qwen3 Base models and generation mismatch with HF (#828 ) * prevent `self.apply_chat_template` being applied for base Qwen models * - added no chat template comparison in `test_chat_wrap_and_equivalence` - removed duplicate comparison * Revert "- added no chat template comparison in `test_chat_wrap_and_equivalence`" This reverts commit 3a5ee8cfa19aa7e4874cd5f35171098be760b05f. * Revert "prevent `self.apply_chat_template` being applied for base Qwen models" This reverts commit df504397a8957886c6d6d808615545e37ceffcad. * copied `download_file` in `utils` from https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/utils.py * added copy of test `def test_tokenizer_equivalence()` from `reasoning-from-scratch` in `test_qwen3.py` * removed duplicate code fragment in`test_chat_wrap_and_equivalence` * use apply_chat_template * add toggle for instruct model * Update tokenizer usage --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2025-09-17 08:14:11 -05:00
Synix	bfc6389fab	fix code comment (#834 )	2025-09-17 01:36:02 +00:00
Sebastian Raschka	b6cd0a312f	More efficient angles computation in RoPE (#830 )	2025-09-16 03:23:33 +00:00
Sebastian Raschka	147dc49ab5	rename eval method (#832 )	2025-09-16 02:47:20 +00:00
Sebastian Raschka	8add26cbe9	Improve weight tying handling (#826 ) * Improve weight tying handling * fix	2025-09-14 15:46:48 -05:00
rasbt	1412b139f2	main push to sync github ruleset	2025-09-14 11:59:52 -05:00
Sebastian Raschka	8f3e5b024d	Add LoRA scaling (#823 )	2025-09-14 11:57:55 -05:00
Sebastian Raschka	fc101b710e	Added Apple Silicon GPU device update (#820 ) * Added Apple Silicon GPU device * Added Apple Silicon GPU device * delete: remove unused model.pth file from understanding-buffers * update * update --------- Co-authored-by: missflash <missflash@gmail.com>	2025-09-13 12:48:06 -05:00
Andreas Yin	8e170312fe	fix: correct role of the beta hyperparameter on the DPO loss (#818 ) Increasing beta leads to less divergence between the new model and the reference model.	2025-09-12 20:21:38 -05:00
Sebastian Raschka	32965e0edd	remove redundant next_cache (#817 )	2025-09-11 15:16:08 -05:00
Sebastian Raschka	c7a4362ca4	Add defensive context trimming for multiturn (#815 ) * Add defensive context trimming for multiturn * add all mods	2025-09-09 20:19:00 -05:00
Sebastian Raschka	215abdbcdd	Improve multiturn stopping condition (#814 ) * Improve multiturn stopping condition * improve	2025-09-09 19:37:15 -05:00
Sebastian Raschka	4b0021416a	Clarify Qwen3 notebook purpose (#812 ) * Clarify Qwen3 notebook purpose * Update README.md * Update README.md	2025-09-06 15:31:35 -05:00
Sebastian Raschka	6d175a22df	Fix IMDb spelling (#811 ) * Add SSL instructions * Fix IMDb spelling	2025-09-06 12:04:47 -05:00
Sebastian Raschka	18c6b970ab	Add additional notes on debugging SSL issues (#810 ) * Add SSL instructions * update old pytorch tests * update * update * update * update * update * update * update * update	2025-09-06 11:46:50 -05:00
Sebastian Raschka	290fa10d55	Update code dependencies (#809 )	2025-09-05 16:40:00 -05:00
Sebastian Raschka	5ae41c402e	Fix code comment	2025-09-05 14:02:24 -05:00
Sebastian Raschka	623dc65d5d	Update requirements for Intel Macs (#807 )	2025-09-04 15:07:46 -05:00
Sebastian Raschka	efad18bd0b	Fix accidental indentation	2025-09-04 14:41:08 -05:00
Sebastian Raschka	9bfa92fb3e	Update README.md	2025-09-03 12:41:32 -05:00
Sebastian Raschka	590d8489d0	Update requirements for Intel macOS (#805 )	2025-09-03 12:15:25 -05:00
rasbt	65e67a9681	fix typo	2025-09-02 10:17:51 -05:00

1 2 3 4 5 ...

982 Commits