Sebastian Raschka
2f53bf5fe5
Link the other KV cache sections ( #708 )
2025-06-24 16:52:29 -05:00
Sebastian Raschka
81eda38d3b
Improve KV cache code for torch.compile ( #705 )
...
* Improve KV cache code for torch.compile
* cleanup
* cleanup
2025-06-23 18:08:49 -05:00
Martin Ma
6522be94be
Fix bug in masking when kv cache is used. ( #697 )
...
* Fix bug in masking when kv cache is used.
* add tests
* dd tests
* upd
* add kv cache test to gh workflow
* explicit mask slicing
* upd
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
2025-06-23 13:12:56 -05:00
Shamik
f5bc863752
Update README.md ( #702 )
...
Typo in kv cache readme
2025-06-23 07:21:51 -05:00
Sebastian Raschka
fdc3e1b701
Add GPT-2 KV cache to pkg ( #687 )
2025-06-21 12:29:04 -05:00
Sebastian Raschka
ece59ba587
Optimize KV cache ( #673 )
...
* Optimize KV cache
* style
* interpretable generate
* interpretable generate
* update readme
2025-06-16 16:00:50 -05:00
Sebastian Raschka
ba0370abd1
Optimized KV cache ( #672 )
...
* Optimized KV cache
* typo fix
2025-06-15 14:26:16 -05:00
Sebastian Raschka
2af686d70b
Add KV cache ( #671 )
2025-06-15 09:58:08 -05:00
Sebastian Raschka
c21bfe4a23
Add PyPI package ( #576 )
...
* Add PyPI package
* fixes
* fixes
2025-03-23 19:28:49 -05:00
Sebastian Raschka
73f4342664
add ch04 code along video ( #573 )
2025-03-17 11:20:55 -05:00
Sebastian Raschka
a08d7aaa84
Uv workflow improvements ( #531 )
...
* Uv workflow improvements
* Uv workflow improvements
* linter improvements
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* pytproject.toml fixes
* windows fixes
* windows fixes
* windows fixes
* windows fixes
* windows fixes
* windows fixes
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
* win32 fix
2025-02-16 13:16:51 -06:00
Sebastian Raschka
68e2efe1c9
Mention small discrepancy due to Dropout non-reproducibility in PyTorch ( #519 )
...
* Mention small discrepancy due to Dropout non-reproducibility in PyTorch
* bump pytorch version
2025-02-06 14:59:52 -06:00
Sebastian Raschka
126adb7663
Include mathematical breakdown for exercise solution 4.1 ( #483 )
2025-01-14 19:23:00 -06:00
Sebastian Raschka
b6c4b2f9f1
Update bonus section formatting ( #400 )
2024-10-12 10:26:08 -05:00
rasbt
93d9dae95f
update card
2024-10-11 12:15:01 -05:00
rasbt
1f4fca9f8e
update reference numbers
2024-10-11 12:13:10 -05:00
Sebastian Raschka
6d0f59a49c
Add MFU formula as reference material ( #395 )
...
* add MFU formula as reference material
* Update previous_chapters.py
2024-10-10 19:42:53 -05:00
rasbt
dc1b1a05b0
note about random numbers
2024-09-22 12:02:03 -05:00
Sebastian Raschka
222f7b16f8
update gpt-2 paper url
2024-09-20 07:00:06 -07:00
rasbt
8ad50a3315
update gpt-2 paper link
2024-09-09 06:31:28 -05:00
rasbt
1e48c13e89
update gpt-2 paper link
2024-09-08 15:49:44 -05:00
Sebastian Raschka
08040f024c
Test code in pytorch 2.4 ( #285 )
...
* test code in pytorch 2.4
* update
2024-07-24 21:53:41 -05:00
Thanh Tran
070a69fc8b
fix typos & inconsistent texts ( #269 )
...
Co-authored-by: TRAN <you@example.com>
2024-07-17 07:34:51 -05:00
Jeroen Van Goey
48bd72c890
fix typos, add codespell pre-commit hook ( #264 )
...
* fix typos, add codespell pre-commit hook
* Update .pre-commit-config.yaml
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-07-16 07:07:04 -05:00
rasbt
6ffd628bb6
add missing "be" to figure
2024-07-15 08:06:05 -05:00
rasbt
921e91a05f
use correct chapter reference
2024-07-02 17:29:57 -05:00
rasbt
31806828d0
add links to summary sections
2024-06-29 07:33:26 -05:00
rasbt
796f0e2a30
add clarifying note about GELU
2024-06-29 07:14:36 -05:00
rasbt
ab23ca5b1b
force refresh figure
2024-06-29 07:01:37 -05:00
rasbt
6a8acf5135
remove redundant plus sign
2024-06-29 06:59:36 -05:00
Daniel Kleine
81c843bdc0
minor fixes ( #246 )
...
* removed duplicated white spaces
* Update ch07/01_main-chapter-code/ch07.ipynb
* Update ch07/05_dataset-generation/llama3-ollama.ipynb
* removed duplicated white spaces
* fixed title again
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-06-25 17:30:30 -05:00
Sebastian Raschka
5944ab0678
Update README.md
2024-06-22 12:09:02 -05:00
rasbt
283397aaf2
add main and optional sections
2024-06-19 17:48:25 -05:00
Daniel Kleine
bbb2a0c3d5
fixed num_workers ( #229 )
...
* fixed num_workers
* ch06 & ch07: added num_workers to create_dataloader_v1
2024-06-19 17:36:46 -05:00
rasbt
e24fd98cdf
distinguish better between main chapter code and bonus materials
2024-06-11 21:07:42 -05:00
Daniel Kleine
dcbdc1d2e5
fixes for code ( #206 )
...
* updated .gitignore
* removed unused GELU import
* fixed model_configs, fixed all tensors on same device
* removed unused tiktoken
* update
* update hparam search
* remove redundant tokenizer argument
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-06-11 20:59:48 -05:00
rasbt
39c4a887eb
add allowed_special={"<|endoftext|>"}
2024-06-09 06:04:02 -05:00
Sebastian Raschka
72a073bbbf
Remove leftover instances of self.tokenizer ( #201 )
...
* Remove leftover instances of self.tokenizer
* add endoftext token
2024-06-08 14:57:34 -05:00
rasbt
98d453b666
update formatting
2024-05-24 07:20:37 -05:00
rasbt
e5e6aaf9f1
flops analysis
2024-05-23 20:35:41 -05:00
rasbt
c735c21e87
fix swiglu acronym
2024-05-01 20:26:17 -05:00
Sebastian Raschka
97ed38116a
Rename drop_resid to drop_shortcut ( #136 )
2024-04-28 14:31:27 -05:00
rasbt
d202cabdee
update figures
2024-04-20 11:42:03 -05:00
Sebastian Raschka
dd51d4ad83
Make datesets and loaders compatible with multiprocessing ( #118 )
2024-04-13 13:57:56 -05:00
James Holcombe
05718c6b94
Use instance tokenizer ( #116 )
...
* Use instance tokenizer
* consistency updates
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-04-10 21:16:19 -04:00
rasbt
6de0417321
cleanup
2024-04-04 07:58:41 -05:00
Sebastian Raschka
2de60d1bfb
Rename variable to context_length to make it easier on readers ( #106 )
...
* rename to context length
* fix spacing
2024-04-04 07:27:41 -05:00
Sebastian Raschka
3829ccdb34
Remove reundant dropout in MLP module ( #105 )
2024-04-03 20:19:08 -05:00
rasbt
f24da86abe
title case
2024-03-27 07:30:09 -05:00
Sebastian Raschka
a2cd8436cb
Ch05 supplementary code ( #81 )
2024-03-19 09:26:26 -05:00