mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-09-25 16:17:10 +00:00
Update README.md
This commit is contained in:
parent
45e7826954
commit
d4989e01c5
@ -118,4 +118,4 @@ Note that this code focuses on keeping things simple and minimal for educational
|
||||
5. Add a more advanced logger (for example, Weights and Biases) to view the loss and validation curves live
|
||||
6. Add distributed data parallelism (DDP) and train the model on multiple GPUs (see section *A.9.3 Training with multiple GPUs* in appendix A; [DDP-script.py](../../appendix-A/03_main-chapter-code/DDP-script.py)).
|
||||
7. Swap the from scratch `MultiheadAttention` class in the `previous_chapter.py` script with the efficient `MHAPyTorchScaledDotProduct` class implemented in the [Efficient Multi-Head Attention Implementations](../../ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb) bonus section, which uses Flash Attention via PyTorch's `nn.functional.scaled_dot_product_attention` function.
|
||||
|
||||
8. Speeding up the training by optimizing the model via [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) (`model = torch.compile`) or [thunder](https://github.com/Lightning-AI/lightning-thunder) (`model = thunder.jit(model)`).
|
||||
|
Loading…
x
Reference in New Issue
Block a user