LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-08-09 09:12:51 +00:00

Author	SHA1	Message	Date
rasbt	6f92909e58	Clarify extension instalation	2024-09-12 07:59:17 -05:00
Sebastian Raschka	11f97a13b6	Simplify conda section	2024-09-11 21:17:59 -05:00
rasbt	f1accdf273	clarifications	2024-09-11 20:16:35 -05:00
rasbt	fe2136e7c9	update title	2024-09-10 21:43:32 -05:00
Sebastian Raschka	835ed29dbf	reflection-tuning dataset generation (#349 )	2024-09-10 21:42:12 -05:00
rasbt	8ad50a3315	update gpt-2 paper link	2024-09-09 06:31:28 -05:00
rasbt	1e48c13e89	update gpt-2 paper link	2024-09-08 15:49:44 -05:00
rasbt	b94546aa14	minor spelling fix	2024-09-08 15:35:36 -05:00
Daniel Kleine	2ee3df622e	nbviewer links / typo (#346 ) * fixed typo * removed remaining nbviewer links * Update mha-implementations.ipynb --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-09-07 07:27:28 +02:00
Sebastian Raschka	ad12c8da06	Einsum multi-head attention (#345 ) * Einsum multi-head attention * update diff	2024-09-05 18:24:33 +02:00
Gustavo Monti	34e16991bb	updating REAMDE from chapter 02 inclund 04_bonus section (#344 ) * updating REAMDE from chapter 02 inclund 04_bonus section * Update ch02/README.md --------- Co-authored-by: Gustavo Monti Rocha <gustavo.rocha@intelliway.com.br> Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-09-05 08:09:46 +02:00
Sebastian Raschka	91db4e3a0f	Revert nbviewer links	2024-09-05 08:09:33 +02:00
rasbt	9de277421e	consistent header for appendix E	2024-08-30 08:08:01 +02:00
Sebastian Raschka	d391796ec2	use nbviewer links (#339 )	2024-08-29 09:09:10 +02:00
rasbt	3760adbd3d	refresh figures	2024-08-27 08:26:40 +02:00
Daniel Kleine	c7267c3b09	ch06/03 fixes (#336 ) * fixed bash commands * fixed help docstrings * added missing logreg bash cmd * Update train_bert_hf.py * Update train_bert_hf_spam.py * Update README.md --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-27 08:23:25 +02:00
rasbt	91cdfe3309	sklearn baseline and roberta-large update	2024-08-26 10:31:54 +02:00
TITC	4f791e6cc2	add RoBERTa and params frozen (#335 ) * add roberta experiment result * add roberta & params frozen * Update README.md * modify lr * modify lr --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-26 10:27:09 +02:00
Sebastian Raschka	26f94876f7	Update README.md	2024-08-24 07:22:18 -05:00
Sebastian Raschka	564362044a	add BERT experiment results (#333 ) * add BERT experiment results * cleanup * formatting	2024-08-23 08:40:40 -05:00
Sebastian Raschka	ed257789a4	grammar fix	2024-08-21 19:16:21 -05:00
Sebastian Raschka	c2117ca073	topk comment	2024-08-20 20:44:15 -05:00
rasbt	9f0bda7af5	add note about duplicated cell	2024-08-19 21:04:18 -05:00
Sebastian Raschka	01cb137bfd	Note about MPS devices (#329 )	2024-08-19 20:58:45 -05:00
Sebastian Raschka	c443035d56	Note about MPS in ch06 and ch07 (#325 )	2024-08-19 08:11:33 -05:00
Sebastian Raschka	8ef5022511	Note about ch05 mps support (#324 )	2024-08-19 07:40:24 -05:00
rasbt	e7cb2ebd8d	update lora experiments	2024-08-16 08:57:46 -05:00
rasbt	b2858a91c5	remove redundant indentation	2024-08-16 07:54:02 -05:00
TITC	d2befdf00e	typo fix (#323 )	2024-08-16 06:57:30 -05:00
TITC	f6526f7766	typo fix (#321 ) * typo fix experiment 5 is the same size as experiment 1, both are 124 Million. and experiment 6 is 355M ~3x 124M. * typo fix all layer FT vs all layer FT, "slightly worse by 1.3% " indicates it's exp 5 vs exp 9 * exp 5 vs 9 * Update ch06/02_bonus_additional-experiments/README.md --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-15 08:45:34 -05:00
Sebastian Raschka	4ea526908e	Update README.md	2024-08-15 08:22:47 -05:00
Daniel Kleine	c65928f7dc	added std error bars (#320 ) * added std error bars * fixed changes * Update on A100 --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-13 20:57:41 -05:00
Sebastian Raschka	9713e70a20	Consistency update in README.md	2024-08-13 07:33:10 -05:00
TITC	38390b2a8d	track tokens seen in chapter5, track examples seen in chapter6 (#319 )	2024-08-13 07:09:05 -05:00
rasbt	5f0c55ddee	fix code cell ordering	2024-08-12 19:04:05 -05:00
Jeroen Van Goey	76e6910a1a	Small typo fix (#313 ) * typo fix * Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb * Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb * Update ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-12 07:54:12 -05:00
Eric Thomson	da5236ee72	Adds .vscode folder to .gitignore (#314 ) * added .vscode folder to .gitignore * Update .gitignore --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-12 07:49:11 -05:00
Sebastian Raschka	3f6652d87e	update attention benchmarks (#307 )	2024-08-10 09:44:11 -05:00
Sebastian Raschka	7feb8cad86	Update README.md	2024-08-10 07:54:51 -05:00
Daniel Kleine	13dbc548f8	fixed bash command (#305 ) Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>	2024-08-09 21:29:04 -05:00
TITC	09a3a73f2d	remove all non-English texts and notice (#304 ) * remove all non-English texts and notice 1. almost 18GB txt left after `is_english` filtered. 2. remove notice use gutenberg's strip_headers 3. after re-run get_data.py, seems all data are under `gutenberg/data/.mirror` folder. * some improvements * update readme --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2024-08-09 17:09:14 -05:00
Sebastian Raschka	f1c3d451fe	Update README.md	2024-08-08 07:50:45 -05:00
Sebastian Raschka	81e9cea3d3	Update README.md	2024-08-08 07:47:31 -05:00
Sebastian Raschka	c5eaae11b1	Revert accidental edit	2024-08-06 19:54:34 -05:00
rasbt	06151a809e	note about logistic sigmoid	2024-08-06 19:48:30 -05:00
rasbt	26df0c474c	note about logistic sigmoid	2024-08-06 19:48:06 -05:00
rasbt	e810f9f004	extend equation description	2024-08-06 19:46:50 -05:00
rasbt	c8090f30ef	add more explanations	2024-08-06 19:45:11 -05:00
Sebastian Raschka	98d24a1607	Update README.md	2024-08-06 08:02:01 -05:00
TITC	d16527ddf2	total training iters may equal to warmup_iters (#301 ) total_training_iters=20, warmup_iters=20= len(train_loader) 4 multiply n_epochs 5, then ZeroDivisionError occurred. ```shell Traceback (most recent call last): File "LLMs-from-scratch/ch05/05_bonus_hparam_tuning/hparam_search.py", line 191, in <module> train_loss, val_loss = train_model( ^^^^^^^^^^^^ File "/mnt/raid1/docker/ai/LLMs-from-scratch/ch05/05_bonus_hparam_tuning/hparam_search.py", line 90, in train_model progress = (global_step - warmup_iters) / (total_training_iters - warmup_iters) ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ZeroDivisionError: division by zero ```	2024-08-06 07:10:05 -05:00

1 2 3 4 5 ...

732 Commits