LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-08-09 09:12:51 +00:00

Author	SHA1	Message	Date
Sebastian Raschka	2e143f17b8	Adjust comment to save compiled model (#583 )	2025-03-27 10:43:45 -05:00
Sebastian Raschka	92f1313e00	Vocab padding clarification (#582 ) * vocab padding clarification * Update ch05/10_llm-training-speed/README.md	2025-03-26 13:19:55 -05:00
Sebastian Raschka	b7893457da	More explicit torchrun usage doc (#578 )	2025-03-24 12:01:03 -05:00
Sebastian Raschka	feb1e9a83d	Add readme (#577 )	2025-03-23 19:35:12 -05:00
Sebastian Raschka	c21bfe4a23	Add PyPI package (#576 ) * Add PyPI package * fixes * fixes	2025-03-23 19:28:49 -05:00
Sebastian Raschka	7757c3d308	Speed comparison figure (#575 )	2025-03-21 11:29:49 -05:00
Greg Gandenberger	c1611d4ea8	Fix minor printing issue and note inconsistency across platforms (#563 ) * Fix printing issue and note inconsistency * Rerun notebook	2025-03-14 15:12:09 -05:00
Sebastian Raschka	86b714a5e0	Specify UTF-8 encoding in the json load command explicitely (#557 )	2025-03-05 11:46:21 -06:00
Sebastian Raschka	5fc78ff9fd	Add a note about "zsh: illegal hardware instruction python" error (#555 )	2025-03-02 15:18:24 -06:00
Sebastian Raschka	f12b899d96	GitHub markdown updates (#545 ) * GitHub markdown updates * Apply suggestions from code review * Apply suggestions from code review	2025-02-23 12:25:44 -06:00
Sebastian Raschka	a08d7aaa84	Uv workflow improvements (#531 ) * Uv workflow improvements * Uv workflow improvements * linter improvements * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * pytproject.toml fixes * windows fixes * windows fixes * windows fixes * windows fixes * windows fixes * windows fixes * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix * win32 fix	2025-02-16 13:16:51 -06:00
Sebastian Raschka	074a6efb33	Update link to vocab size increase (#526 ) * Update link to vocab size increase * Update ch05/10_llm-training-speed/README.md * Update ch05/10_llm-training-speed/README.md	2025-02-14 08:03:01 -06:00
Sebastian Raschka	908dd2f71e	PyTorch tips for better training performance (#525 ) * PyTorch tips for better training performance * formatting * pep 8	2025-02-12 16:10:34 -06:00
Sebastian Raschka	a6cc574605	Upgrade to NumPy 2.0 (#520 ) * Upgrade to NumPy 2.0 * bump pytorch * bump pytorch * bump pytorch * bump pytorch * bump pytorch * update * update packages	2025-02-09 06:21:58 -06:00
Sebastian Raschka	68e2efe1c9	Mention small discrepancy due to Dropout non-reproducibility in PyTorch (#519 ) * Mention small discrepancy due to Dropout non-reproducibility in PyTorch * bump pytorch version	2025-02-06 14:59:52 -06:00
Sebastian Raschka	25ea71e713	Alternative weight loading via .safetensors (#507 )	2025-01-29 08:15:29 -06:00
Sebastian Raschka	a22d612be6	Bonus material: extending tokenizers (#496 ) * Bonus material: extending tokenizers * small wording update	2025-01-22 09:26:54 -06:00
Sebastian Raschka	4bfbcd069d	Auto download DPO dataset if not already available in path (#479 ) * Auto download DPO dataset if not already available in path * update tests to account for latest HF transformers release in unit tests * pep 8	2025-01-12 12:27:28 -06:00
Sebastian Raschka	701090815e	Add backup URL for gpt2 weights (#469 ) * Add backup URL for gpt2 weights * newline	2025-01-05 11:28:09 -06:00
Sebastian Raschka	1b635f760e	fix misplaced parenthesis and update license (#466 )	2025-01-04 11:14:08 -06:00
casinca	bb31de8999	[minor] typo & comments (#441 ) * typo & comment - safe -> save - commenting code: batch_size, seq_len = in_idx.shape * comment - adding # NEW for assert num_heads % num_kv_groups == 0 * update memory wording --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2024-11-18 19:52:42 +09:00
Sebastian Raschka	f61c008c5d	Add missing device transfer in gpt_generate.py (#436 )	2024-11-14 19:12:53 +09:00
Daniel Kleine	81eed9afe2	updated RoPE statement (#423 ) * updated RoPE statement * updated .gitignore * Update ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-10-30 08:00:08 -05:00
ROHAN WINSOR	cd24a27161	Fix argument name in LlamaTokenizer constructor (#421 ) This PR addresses an oversight in the LlamaTokenizer class by changing the constructor argument from filepath to tokenizer_file.	2024-10-29 18:01:36 -05:00
Daniel Kleine	e8c2f962e9	minor fixes: Llama 3.2 standalone (#420 ) * minor fixes * reformat rope base as float --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2024-10-25 21:08:06 -05:00
Sebastian Raschka	1516de54a5	RoPE theta rescaling (#419 ) * rope fixes * update * update * cleanup	2024-10-25 15:27:23 -05:00
Daniel Kleine	5ff72c2850	fixed typos (#414 ) * fixed typos * fixed formatting * Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb * del weights after load into model --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-10-24 18:23:53 -05:00
Daniel Kleine	d38083c401	Updated Llama 2 to 3 paths (#413 ) * llama 2 and 3 path fixes * updated llama 3, 3.1 and 3.2 paths * updated .gitignore * Typo fix --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-10-24 07:40:08 -05:00
Sebastian Raschka	e1dfd2cb7a	Update test-requirements-extra.txt	2024-10-23 19:19:58 -05:00
Sebastian Raschka	7cd6a670ed	RoPE updates (#412 ) * RoPE updates * Apply suggestions from code review * updates * updates * updates	2024-10-23 18:07:49 -05:00
Sebastian Raschka	4f9c9fb703	Update tests.py	2024-10-23 07:48:33 -05:00
Sebastian Raschka	534a704364	RoPE increase (#407 )	2024-10-21 19:58:38 -05:00
rasbt	cd2753a36d	update mmap section	2024-10-14 14:27:19 -05:00
rasbt	08362fd290	add mmap=True comparison	2024-10-14 11:09:55 -05:00
Sebastian Raschka	05b04f2a5a	Memory efficient weight loading (#401 ) * memory efficient weight loading * remove unused code	2024-10-14 10:30:25 -05:00
Sebastian Raschka	b6c4b2f9f1	Update bonus section formatting (#400 )	2024-10-12 10:26:08 -05:00
Sebastian Raschka	ec18b6a8a3	Add Llama 3.2 RoPE to CI (#391 ) * add Llama 3.2 RoPE to CI * update	2024-10-08 08:28:34 -05:00
Sebastian Raschka	1eb0b3810a	Introduce buffers to improve Llama 3.2 efficiency (#389 ) * Introduce buffers to improve Llama 3.2 efficiency * update * update	2024-10-06 12:49:04 -05:00
Daniel Kleine	a0c0c765a8	fixed Llama 2 to 3.2 NBs (#388 ) * updated requirements * fixes llama2 to llama3 * fixed llama 3.2 standalone * fixed typo * fixed rope formula * Update requirements-extra.txt * Update ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb * Update ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb * Update ch05/07_gpt_to_llama/standalone-llama32.ipynb --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-10-06 09:56:55 -05:00
Sebastian Raschka	0972ded530	Add a note about weight tying in Llama 3.2 (#386 )	2024-10-05 09:20:54 -05:00
Sebastian Raschka	8a448a4410	Llama 3 (#384 ) * Implement Llama 3.2 * Add Llama 3.2 files * exclude IMDB link because stanford website seems down	2024-10-05 07:52:15 -05:00
Sebastian Raschka	8553644440	Llama 3.2 requirements file	2024-10-05 07:32:43 -05:00
Sebastian Raschka	b44096acef	Implement Llama 3.2 (#383 )	2024-10-05 07:30:47 -05:00
Sebastian Raschka	a5405c255d	Cos-sin fix in Llama 2 bonus notebook (#381 )	2024-10-03 20:45:40 -05:00
Sebastian Raschka	b993c2b25b	Improve rope settings for llama3 (#380 )	2024-10-03 08:29:54 -05:00
rasbt	278a50a348	add section numbers	2024-09-30 08:42:22 -05:00
rasbt	bfa4215774	llama note	2024-09-26 07:41:11 -05:00
Sebastian Raschka	b56d0b2942	Add llama2 unit tests (#372 ) * add llama2 unit tests * update * updates * updates * update file path * update requirements file * rmsnorm test * update	2024-09-25 19:40:36 -05:00
rasbt	a6d8e93da3	improve formatting	2024-09-24 18:49:17 -05:00
Daniel Kleine	ff31b345b0	ch05/07 gpt_to_llama text improvements (#369 ) * fixed typo * fixed RMSnorm formula * fixed SwiGLU formula * temperature=0 for untrained model for reproducibility * added extra info hf token	2024-09-24 18:45:49 -05:00

1 2 3 4

165 Commits