mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2025-10-27 15:59:49 +00:00

History

Daniel Kleine 4f9775d91c fixed Llama 2 to 3.2 NBs (#388 )

* updated requirements

* fixes llama2 to llama3

* fixed llama 3.2 standalone

* fixed typo

* fixed rope formula

* Update requirements-extra.txt

* Update ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb

* Update ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb

* Update ch05/07_gpt_to_llama/standalone-llama32.ipynb

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>

2024-10-06 09:56:55 -05:00

01_main-chapter-code

llama note

2024-09-26 07:41:11 -05:00

02_alternative_weight_loading

fixed num_workers (#229 )

2024-06-19 17:36:46 -05:00

03_bonus_pretraining_on_gutenberg

Update README.md

2024-08-10 07:54:51 -05:00

04_learning_rate_schedulers

Add and link bonus material (#84 )

2024-03-23 07:27:43 -05:00

05_bonus_hparam_tuning

total training iters may equal to warmup_iters (#301 )

2024-08-06 07:10:05 -05:00

06_user_interface

Add user interface to ch06 and ch07 (#366 )

2024-09-21 20:33:00 -05:00

07_gpt_to_llama

fixed Llama 2 to 3.2 NBs (#388 )

2024-10-06 09:56:55 -05:00

README.md

Llama 3 (#384 )

2024-10-05 07:52:15 -05:00

README.md

Chapter 5: Pretraining on Unlabeled Data

Main Chapter Code

01_main-chapter-code contains the main chapter code

Bonus Materials

02_alternative_weight_loading contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
03_bonus_pretraining_on_gutenberg contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
04_learning_rate_schedulers contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
05_bonus_hparam_tuning contains an optional hyperparameter tuning script
06_user_interface implements an interactive user interface to interact with the pretrained LLM
07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI