LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-10-31 01:41:26 +00:00

Author	SHA1	Message	Date
TITC	d16527ddf2	total training iters may equal to warmup_iters (#301 ) total_training_iters=20, warmup_iters=20= len(train_loader) 4 multiply n_epochs 5, then ZeroDivisionError occurred. ```shell Traceback (most recent call last): File "LLMs-from-scratch/ch05/05_bonus_hparam_tuning/hparam_search.py", line 191, in <module> train_loss, val_loss = train_model( ^^^^^^^^^^^^ File "/mnt/raid1/docker/ai/LLMs-from-scratch/ch05/05_bonus_hparam_tuning/hparam_search.py", line 90, in train_model progress = (global_step - warmup_iters) / (total_training_iters - warmup_iters) ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ZeroDivisionError: division by zero ```	2024-08-06 07:10:05 -05:00
Daniel Kleine	dcbdc1d2e5	fixes for code (#206 ) * updated .gitignore * removed unused GELU import * fixed model_configs, fixed all tensors on same device * removed unused tiktoken * update * update hparam search * remove redundant tokenizer argument --------- Co-authored-by: rasbt <mail@sebastianraschka.com>	2024-06-11 20:59:48 -05:00
rasbt	6f0a5c320b	fix learning rate scheduler	2024-06-03 07:06:42 -05:00
rasbt	b40c260859	update how to retrieve learning rate	2024-05-23 17:19:01 -05:00
Sebastian Raschka	c70ddff558	Return nan if val loader is empty (#124 )	2024-04-20 08:02:30 -05:00
Sebastian Raschka	dd51d4ad83	Make datesets and loaders compatible with multiprocessing (#118 )	2024-04-13 13:57:56 -05:00
Sebastian Raschka	2de60d1bfb	Rename variable to context_length to make it easier on readers (#106 ) * rename to context length * fix spacing	2024-04-04 07:27:41 -05:00
rasbt	88b2dd780a	make batch loss calculatution more efficient	2024-03-27 07:11:56 -05:00
rasbt	3cb5a52a1b	simplify calc_loss_loader	2024-03-26 20:34:50 -05:00
Sebastian Raschka	cf39abac04	Add and link bonus material (#84 )	2024-03-23 07:27:43 -05:00

10 Commits