From 90b25ece3dd2e33f6d88a2d00aecb8ca8fcb9df1 Mon Sep 17 00:00:00 2001 From: Daniel Kleine <53251018+d-kleine@users.noreply.github.com> Date: Wed, 3 Jul 2024 14:47:33 +0200 Subject: [PATCH] fixed spelling typos (#258) --- ch06/02_bonus_additional-experiments/README.md | 2 +- ch06/03_bonus_imdb-classification/README.md | 2 +- ch07/01_main-chapter-code/ch07.ipynb | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/ch06/02_bonus_additional-experiments/README.md b/ch06/02_bonus_additional-experiments/README.md index d0424e0..da43af5 100644 --- a/ch06/02_bonus_additional-experiments/README.md +++ b/ch06/02_bonus_additional-experiments/README.md @@ -65,4 +65,4 @@ I've kept the LLM and dataset small on purpose, so you can run the training on a 8. **Padding Input to Full Context Length vs. Longest Training Example (Row 1 vs. 11)**: Padding the input to the full supported context length results is significantly worse. 9. **Padding vs no padding (Row 1 vs. 12 and 13)**: The `--no_padding` option disables the padding in the dataset, which requires training the model with a batch size of 1 since the inputs have variable lengths. This results in a better test accuracy but takes longer to train. In row 12, we additionally enable gradient accumulation with 8 steps to achieve the same batch size as in the other experiments, which helps reduce overfitting and slightly boost the test set accuracy. 10. **Disabling the causal attention mask (Row 1 vs. 14)**: Disables the causal attention mask used in the multi-head attention module. This means all tokens can attend all other tokens. The model accuracy is slightly improved compared to the GPT model with causal mask. -11. **Ignoring the padding indeces in the loss and backpropagation (Row 1 vs. 15)**: Setting `--ignore_index 50256` excludes the `|endoftext|` padding tokens in the `cross_entropy` loss function in PyTorch. In this case, it does not have any effect because we replaced the output layers so that the token IDs are either 0 or 1 for the binary classification example. However, this setting is useful when instruction finetuning models in chapter 7. +11. **Ignoring the padding indices in the loss and backpropagation (Row 1 vs. 15)**: Setting `--ignore_index 50256` excludes the `|endoftext|` padding tokens in the `cross_entropy` loss function in PyTorch. In this case, it does not have any effect because we replaced the output layers so that the token IDs are either 0 or 1 for the binary classification example. However, this setting is useful when instruction finetuning models in chapter 7. diff --git a/ch06/03_bonus_imdb-classification/README.md b/ch06/03_bonus_imdb-classification/README.md index 9027088..24e75f0 100644 --- a/ch06/03_bonus_imdb-classification/README.md +++ b/ch06/03_bonus_imdb-classification/README.md @@ -107,7 +107,7 @@ python train-bert-hf.py --bert_model roberta --- -A scikit-learn Logistic Regression model as a basline. +A scikit-learn Logistic Regression model as a baseline. ```bash python train-sklearn-logreg.py diff --git a/ch07/01_main-chapter-code/ch07.ipynb b/ch07/01_main-chapter-code/ch07.ipynb index 1e1976b..284faa3 100644 --- a/ch07/01_main-chapter-code/ch07.ipynb +++ b/ch07/01_main-chapter-code/ch07.ipynb @@ -2050,7 +2050,7 @@ " - human preference comparison to other LLMs, such as LMSYS chatbot arena ([https://arena.lmsys.org](https://arena.lmsys.org))\n", " - automated conversational benchmarks, where another LLM like GPT-4 is used to evaluate the responses, such as AlpacaEval ([https://tatsu-lab.github.io/alpaca_eval/](https://tatsu-lab.github.io/alpaca_eval/))\n", "\n", - "- In the next section, we will use an approach similar to AlpaceEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n", + "- In the next section, we will use an approach similar to AlpacaEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n", "- For this, we add the model response to the `test_data` dictionary and save it as a `\"instruction-data-with-response.json\"` file for record-keeping so that we can load and analyze it in separate Python sessions if needed" ] }, @@ -2702,7 +2702,7 @@ "## Summary and takeaways\n", "\n", "- See the [./gpt_instruction_finetuning.py](./gpt_instruction_finetuning.py) script, a self-contained script for classification finetuning\n", - "- [./ollama_evaluate.py](./ollama_evaluate.py) is a standalonw script based on section 7.8 that evaluates a JSON file containing \"output\" and \"response\" keys via Ollama and Llama 3\n", + "- [./ollama_evaluate.py](./ollama_evaluate.py) is a standalone script based on section 7.8 that evaluates a JSON file containing \"output\" and \"response\" keys via Ollama and Llama 3\n", "- The [./load-finetuned-model.ipynb](./load-finetuned-model.ipynb) notebook illustrates how to load the finetuned model in a new session\n", "- You can find the exercise solutions in [./exercise-solutions.ipynb](./exercise-solutions.ipynb)" ] @@ -2730,7 +2730,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" + "version": "3.10.11" } }, "nbformat": 4,