mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-11-01 02:10:15 +00:00
minor fixes (#246)
* removed duplicated white spaces * Update ch07/01_main-chapter-code/ch07.ipynb * Update ch07/05_dataset-generation/llama3-ollama.ipynb * removed duplicated white spaces * fixed title again --------- Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
This commit is contained in:
parent
5629d4d147
commit
7a54d383e7
@ -115,7 +115,7 @@ Several folders contain optional materials as a bonus for interested readers:
|
||||
|
||||
### Citation
|
||||
|
||||
If you find this book or code useful for your research, please consider citing it:
|
||||
If you find this book or code useful for your research, please consider citing it:
|
||||
|
||||
```
|
||||
@book{build-llms-from-scratch-book,
|
||||
|
||||
@ -1263,7 +1263,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model = NeuralNetwork(2, 2) # needs to match the original model exactly\n",
|
||||
"model = NeuralNetwork(2, 2) # needs to match the original model exactly\n",
|
||||
"model.load_state_dict(torch.load(\"model.pth\"))"
|
||||
]
|
||||
},
|
||||
@ -1340,7 +1340,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@ -710,7 +710,7 @@
|
||||
"- `[UNK]` to represent works that are not included in the vocabulary\n",
|
||||
"\n",
|
||||
"- Note that GPT-2 does not need any of these tokens mentioned above but only uses an `<|endoftext|>` token to reduce complexity\n",
|
||||
"- The `<|endoftext|>` is analogous to the `[EOS]` token mentioned above\n",
|
||||
"- The `<|endoftext|>` is analogous to the `[EOS]` token mentioned above\n",
|
||||
"- GPT also uses the `<|endoftext|>` for padding (since we typically use a mask when training on batched inputs, we would not attend padded tokens anyways, so it does not matter what these tokens are)\n",
|
||||
"- GPT-2 does not use an `<UNK>` token for out-of-vocabulary words; instead, GPT-2 uses a byte-pair encoding (BPE) tokenizer, which breaks down words into subword units which we will discuss in a later section\n",
|
||||
"\n"
|
||||
|
||||
@ -520,7 +520,7 @@
|
||||
"- Note that we also add a smaller value (`eps`) before computing the square root of the variance; this is to avoid division-by-zero errors if the variance is 0\n",
|
||||
"\n",
|
||||
"**Biased variance**\n",
|
||||
"- In the variance calculation above, setting `unbiased=False` means using the formula $\\frac{\\sum_i (x_i - \\bar{x})^2}{n}$ to compute the variance where n is the sample size (here, the number of features or columns); this formula does not include Bessel's correction (which uses `n-1` in the denominator), thus providing a biased estimate of the variance \n",
|
||||
"- In the variance calculation above, setting `unbiased=False` means using the formula $\\frac{\\sum_i (x_i - \\bar{x})^2}{n}$ to compute the variance where n is the sample size (here, the number of features or columns); this formula does not include Bessel's correction (which uses `n-1` in the denominator), thus providing a biased estimate of the variance \n",
|
||||
"- For LLMs, where the embedding dimension `n` is very large, the difference between using n and `n-1`\n",
|
||||
" is negligible\n",
|
||||
"- However, GPT-2 was trained with a biased variance in the normalization layers, which is why we also adopted this setting for compatibility reasons with the pretrained weights that we will load in later chapters\n",
|
||||
@ -1498,7 +1498,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@ -31,7 +31,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- FLOPs (Floating Point Operations Per Second) measure the computational complexity of neural network models by counting the number of floating-point operations executed\n",
|
||||
"- High FLOPs indicate more intensive computation and energy consumption"
|
||||
"- High FLOPs indicate more intensive computation and energy consumption"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@ -1959,7 +1959,7 @@
|
||||
"id": "10e4c7f9-592f-43d6-a00e-598fa01dfb82",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- The recommended way in PyTorch is to save the model weights, the so-called `state_dict` via by applying the `torch.save` function to the `.state_dict()` method:"
|
||||
"- The recommended way in PyTorch is to save the model weights, the so-called `state_dict` via by applying the `torch.save` function to the `.state_dict()` method:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2458,7 +2458,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
File diff suppressed because one or more lines are too long
@ -267,7 +267,7 @@
|
||||
"Model saved as gpt2-medium355M-sft-phi3-prompt.pth\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"For comparison, you can run the original chapter 7 finetuning code via `python exercise_experiments.py --exercise_solution baseline`. \n",
|
||||
"For comparison, you can run the original chapter 7 finetuning code via `python exercise_experiments.py --exercise_solution baseline`. \n",
|
||||
"\n",
|
||||
"Note that on an Nvidia L4 GPU, the code above, using the Phi-3 prompt template, takes 1.5 min to run. In comparison, the Alpaca-style template takes 1.80 minutes to run. So, the Phi-3 template is approximately 17% faster since it results in shorter model inputs. \n",
|
||||
"\n",
|
||||
@ -954,7 +954,7 @@
|
||||
"Model saved as gpt2-medium355M-sft-lora.pth\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"For comparison, you can run the original chapter 7 finetuning code via `python exercise_experiments.py --exercise_solution baseline`. \n",
|
||||
"For comparison, you can run the original chapter 7 finetuning code via `python exercise_experiments.py --exercise_solution baseline`. \n",
|
||||
"\n",
|
||||
"Note that on an Nvidia L4 GPU, the code above, using LoRA, takes 1.30 min to run. In comparison, the baseline takes 1.80 minutes to run. So, LoRA is approximately 28% faster.\n",
|
||||
"\n",
|
||||
|
||||
@ -138,7 +138,7 @@
|
||||
"\n",
|
||||
"- After the download has been completed, you will see a command line prompt that allows you to chat with the model\n",
|
||||
"\n",
|
||||
"- Try a prompt like \"What do llamas eat?\", which should return an output similar to the following:\n",
|
||||
"- Try a prompt like \"What do llamas eat?\", which should return an output similar to the following:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
">>> What do llamas eat?\n",
|
||||
|
||||
@ -139,7 +139,7 @@
|
||||
"\n",
|
||||
"- After the download has been completed, you will see a command line prompt that allows you to chat with the model\n",
|
||||
"\n",
|
||||
"- Try a prompt like \"What do llamas eat?\", which should return an output similar to the following:\n",
|
||||
"- Try a prompt like \"What do llamas eat?\", which should return an output similar to the following:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
">>> What do llamas eat?\n",
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user