Commit Graph

  • a11965fbd9
    Remove persistent flag from cache buffers (#916) main Sebastian Raschka 2025-11-24 20:10:02 -06:00
  • c19533851f
    Add Olmo 3 README (#915) Sebastian Raschka 2025-11-23 10:53:48 -06:00
  • bc6f335526
    Olmo 3 from scratch (#914) Sebastian Raschka 2025-11-22 22:42:18 -06:00
  • 398b079efa
    RoPE decay plot (#910) Sebastian Raschka 2025-11-17 17:29:49 -06:00
  • 28a8408d4d
    Update README wrt multi-query attention Sebastian Raschka 2025-11-17 16:39:32 -06:00
  • a4094470c7
    Write-up on how to get the most out of this book (#909) Sebastian Raschka 2025-11-12 20:20:48 -06:00
  • 7d92267170
    fix(GatedDeltaNet): Init param A from log of a uniform distrib (#906) casinca 2025-11-09 21:22:52 +01:00
  • 35354fac80
    Use consistent title case rasbt 2025-11-06 15:22:24 -06:00
  • 58f45ae5a7
    Fix empty device issue (#904) Sebastian Raschka 2025-11-05 20:04:44 -06:00
  • bcc73f731d
    n_heads × d_head -> d_head × d_head in DeltaNet (#903) Sebastian Raschka 2025-11-05 18:28:37 -06:00
  • 488bef7e3f
    Image resizing Sebastian Raschka 2025-11-02 21:05:38 -06:00
  • c6b8332a59
    Gated DeltaNet write-up (#901) Sebastian Raschka 2025-11-02 21:03:42 -06:00
  • d6c3990c57
    Training on MPS in PyTorch 2.9 (#900) Sebastian Raschka 2025-11-01 16:55:09 -05:00
  • 27d52d6378
    Fix MHAEinsum weight dimension bug when d_in != d_out (#857) (#893) Aviral Garg 2025-11-01 08:15:31 +05:30
  • b1db33b384
    simplify uv command (#898) Sebastian Raschka 2025-10-31 19:44:57 -05:00
  • 760f4c9ecc
    Add bonus dependencies to pyproject (#897) Sebastian Raschka 2025-10-28 20:36:21 -05:00
  • 0adb5b8c65
    Fix ffn link (#892) Sebastian Raschka 2025-10-21 21:19:44 -05:00
  • 7ca7c47e4a
    Make quote style consistent (#891) Sebastian Raschka 2025-10-21 19:42:33 -05:00
  • 9276edbc37
    - docs(moe): correct arXiv link for DeepSeekMoE (#890) casinca 2025-10-21 02:29:06 +02:00
  • 218221ab62
    Mixture-of-Experts intro (#888) Sebastian Raschka 2025-10-19 22:17:59 -05:00
  • 27b6dfab9e
    Make it easier to toggle between thinking and instruct variants (#887) Sebastian Raschka 2025-10-16 20:37:31 -05:00
  • 7fe4874dda
    Update the compression rate comment in MLA (#883) Sebastian Raschka 2025-10-14 11:10:06 -05:00
  • b969b3ef7a
    Use figure numbers in ch05-7 (#881) Sebastian Raschka 2025-10-13 16:26:35 -05:00
  • bf039ff3dc
    Add alternative attention structure (#880) Sebastian Raschka 2025-10-13 14:31:13 -05:00
  • 6eb6adfa33
    sliding window attention (#879) Sebastian Raschka 2025-10-12 22:13:20 -05:00
  • 21f0617ea3
    Add other appendices for completeness (#878) Sebastian Raschka 2025-10-12 19:04:53 -05:00
  • 44eda5340a
    rm plot rasbt 2025-10-12 08:55:03 -05:00
  • 9b9586688d
    Multi-Head Latent Attention (#876) Sebastian Raschka 2025-10-11 20:08:30 -05:00
  • bf27ad1485
    Use GB instead of GiB consistently (#875) Sebastian Raschka 2025-10-11 09:11:33 -05:00
  • c814814d72
    Grouped-Query Attention memory (#874) Sebastian Raschka 2025-10-11 08:44:19 -05:00
  • b8e12e1dd1
    Use inference_device rasbt 2025-10-09 10:59:17 -05:00
  • fecfdd16ff
    Add simpler BPE, and make previous BPE better (#870) Sebastian Raschka 2025-10-08 22:22:34 -05:00
  • 1164cb3e8f
    Qwen3 and evaluation bonus materials (#869) Sebastian Raschka 2025-10-08 18:22:19 -05:00
  • 7bd263144e
    Switch from urllib to requests to improve reliability (#867) Sebastian Raschka 2025-10-07 15:22:59 -05:00
  • 9f7dbb2493
    Update docker file dockerfile rasbt 2025-10-06 18:31:59 -05:00
  • 8552565bda
    Add missing comma in imports in README (#865) Sebastian Raschka 2025-10-06 16:03:04 -05:00
  • 7084123d10
    Note about output dimensions (#862) Sebastian Raschka 2025-10-01 10:47:04 -05:00
  • 4d9f9dcb6c
    Update ollama address (#861) Sebastian Raschka 2025-09-30 21:05:53 -05:00
  • 00c240ff87
    some typo fixes (#858) casinca 2025-09-30 18:18:02 +02:00
  • 458f2d9b67
    Test dependencies with Python 3.13 (#843) Sebastian Raschka 2025-09-27 08:38:07 -05:00
  • 47867bc1cb
    Update generate script (#847) Sebastian Raschka 2025-09-27 08:03:54 -05:00
  • 9bc827ea7e
    Numerically stable generate on mps (#849) Sebastian Raschka 2025-09-26 22:42:44 -05:00
  • f492c949d3
    Requirements update (#851) Sebastian Raschka 2025-09-26 22:19:57 -05:00
  • b1f852c1ba
    Update requirements.txt requirements-update rasbt 2025-09-26 21:57:22 -05:00
  • 3c10919c32
    Numerically stable generate on mps rasbt 2025-09-26 21:37:25 -05:00
  • 322000d833
    Windows compile (#845) Sebastian Raschka 2025-09-26 12:01:19 -05:00
  • 3b83705988
    Update package dependencies (#842) Sebastian Raschka 2025-09-22 18:32:39 -05:00
  • e742d8af2c
    Improve MoE implementation (#841) Sebastian Raschka 2025-09-22 15:21:06 -05:00
  • 20041fb94b
    Note about devcontainer root usage (#833) Sebastian Raschka 2025-09-21 11:12:44 -05:00
  • 2aa8e8130d
    Note about RoPE usage (#839) Sebastian Raschka 2025-09-20 11:25:58 -05:00
  • 42c130623b
    Qwen3Tokenizer fix for Qwen3 Base models and generation mismatch with HF (#828) casinca 2025-09-17 15:14:11 +02:00
  • bfc6389fab
    fix code comment (#834) Synix 2025-09-17 09:36:02 +08:00
  • 862df48e38
    use apply_chat_template qwen-tokenizer-fix rasbt 2025-09-16 08:12:01 -05:00
  • 8237b3fda0 removed duplicate code fragment intest_chat_wrap_and_equivalence casinca 2025-09-16 11:32:05 +02:00
  • 16f30a0395 added copy of test def test_tokenizer_equivalence() from reasoning-from-scratch in test_qwen3.py casinca 2025-09-16 11:12:29 +02:00
  • 4ea2fb4a76 copied download_file in utils from https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/utils.py casinca 2025-09-16 11:10:01 +02:00
  • 186e83c579 Revert "prevent self.apply_chat_template being applied for base Qwen models" casinca 2025-09-16 09:43:01 +02:00
  • 02a1cb1159 Revert "- added no chat template comparison in test_chat_wrap_and_equivalence" casinca 2025-09-16 09:42:47 +02:00
  • 701b5ad54d
    Merge branch 'main' into qwen-tokenizer-fix casinca 2025-09-16 09:38:45 +02:00
  • b6cd0a312f
    More efficient angles computation in RoPE (#830) Sebastian Raschka 2025-09-15 22:23:33 -05:00
  • 147dc49ab5
    rename eval method (#832) Sebastian Raschka 2025-09-15 21:47:20 -05:00
  • 3a5ee8cfa1 - added no chat template comparison in test_chat_wrap_and_equivalence - removed duplicate comparison casinca 2025-09-15 19:30:31 +02:00
  • df504397a8 prevent self.apply_chat_template being applied for base Qwen models casinca 2025-09-15 16:26:17 +02:00
  • 8add26cbe9
    Improve weight tying handling (#826) Sebastian Raschka 2025-09-14 15:46:48 -05:00
  • 1412b139f2
    main push to sync github ruleset rasbt 2025-09-14 11:59:52 -05:00
  • 8f3e5b024d
    Add LoRA scaling (#823) Sebastian Raschka 2025-09-14 11:57:55 -05:00
  • fc101b710e
    Added Apple Silicon GPU device update (#820) Sebastian Raschka 2025-09-13 12:48:06 -05:00
  • 8e170312fe
    fix: correct role of the beta hyperparameter on the DPO loss (#818) Andreas Yin 2025-09-13 03:21:38 +02:00
  • 32965e0edd
    remove redundant next_cache (#817) Sebastian Raschka 2025-09-11 15:16:08 -05:00
  • c7a4362ca4
    Add defensive context trimming for multiturn (#815) Sebastian Raschka 2025-09-09 20:19:00 -05:00
  • 215abdbcdd
    Improve multiturn stopping condition (#814) Sebastian Raschka 2025-09-09 19:37:15 -05:00
  • 4b0021416a
    Clarify Qwen3 notebook purpose (#812) Sebastian Raschka 2025-09-06 15:31:35 -05:00
  • 6d175a22df
    Fix IMDb spelling (#811) Sebastian Raschka 2025-09-06 12:04:47 -05:00
  • 18c6b970ab
    Add additional notes on debugging SSL issues (#810) Sebastian Raschka 2025-09-06 11:46:50 -05:00
  • 290fa10d55
    Update code dependencies (#809) Sebastian Raschka 2025-09-05 16:40:00 -05:00
  • 5ae41c402e
    Fix code comment Sebastian Raschka 2025-09-05 14:02:24 -05:00
  • 623dc65d5d
    Update requirements for Intel Macs (#807) Sebastian Raschka 2025-09-04 15:07:46 -05:00
  • efad18bd0b
    Fix accidental indentation Sebastian Raschka 2025-09-04 14:41:08 -05:00
  • 9bfa92fb3e
    Update README.md Sebastian Raschka 2025-09-03 12:41:32 -05:00
  • 590d8489d0
    Update requirements for Intel macOS (#805) Sebastian Raschka 2025-09-03 12:15:25 -05:00
  • 65e67a9681
    fix typo rasbt 2025-09-02 10:17:51 -05:00
  • 2d8d6224ed
    added brief explanations about 2 different ways of RoPE implementations (#802) Hayato Hongo 2025-09-03 00:14:36 +09:00
  • 9ea2c57c5f
    simplify rasbt 2025-09-01 22:15:47 -05:00
  • 643f800a94
    remove local config files rasbt 2025-09-01 20:52:40 -05:00
  • 9eee9296d9
    Interactive qwen3 chat interface (#801) Sebastian Raschka 2025-09-01 20:50:25 -05:00
  • 70edd53809
    Improve RoPE (#799) Sebastian Raschka 2025-08-31 11:46:36 -05:00
  • d87d91b23c
    Add KVCache variant of Qwen3 notebook (#800) Sebastian Raschka 2025-08-31 11:11:12 -05:00
  • 0e9cefcdc8
    Update pixi powershell command section (#798) Sebastian Raschka 2025-08-30 08:42:06 -05:00
  • a51ff65488
    reasoning-from-scratch (#793) Sebastian Raschka 2025-08-28 18:36:41 -05:00
  • a3a62c509a
    Improve MHA einsum (#781) Jestine Paul 2025-08-23 04:12:26 +08:00
  • 670f7a4dd0
    - added (missing) Gemma3 bullet point in parent folder's readme.md (#788) casinca 2025-08-22 22:03:47 +02:00
  • 4a84cfccf9
    Minor cosmetic fixes in Gemma 3 nbs (#780) Sebastian Raschka 2025-08-19 21:08:29 -05:00
  • f571b5e493
    Add Gemma3 KV cache variant (#776) Sebastian Raschka 2025-08-19 12:37:49 -05:00
  • 8c1f9ccf54
    Improve MHA einsum (#775) Sebastian Raschka 2025-08-19 10:38:15 -05:00
  • 80d4732456
    add HF equivalency tests for standalone nbs (#774) Sebastian Raschka 2025-08-18 18:58:46 -05:00
  • a6b883c9f9
    Gemma 3 270M From Scratch (#771) Sebastian Raschka 2025-08-17 08:23:05 -05:00
  • 8fd29ed079
    Gemma 3 270M from scratch gemma-3 rasbt 2025-08-16 19:49:38 -05:00
  • e9c1c1da38
    Fix qk_norm comment (#769) Sebastian Raschka 2025-08-15 08:38:48 -05:00
  • 27fa95d24b Fix qk_norm comment (#769) Sebastian Raschka 2025-08-15 08:38:48 -05:00
  • b14325e56d
    Qwen3 and Llama3 equivalency teests with HF transformers (#768) Sebastian Raschka 2025-08-14 18:36:07 -05:00