From 90b25ece3dd2e33f6d88a2d00aecb8ca8fcb9df1 Mon Sep 17 00:00:00 2001
From: Daniel Kleine <53251018+d-kleine@users.noreply.github.com>
Date: Wed, 3 Jul 2024 14:47:33 +0200
Subject: [PATCH] fixed spelling typos (#258)

---
 ch06/02_bonus_additional-experiments/README.md | 2 +-
 ch06/03_bonus_imdb-classification/README.md    | 2 +-
 ch07/01_main-chapter-code/ch07.ipynb           | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/ch06/02_bonus_additional-experiments/README.md b/ch06/02_bonus_additional-experiments/README.md
index d0424e0..da43af5 100644
--- a/ch06/02_bonus_additional-experiments/README.md
+++ b/ch06/02_bonus_additional-experiments/README.md
@@ -65,4 +65,4 @@ I've kept the LLM and dataset small on purpose, so you can run the training on a
 8. **Padding Input to Full Context Length vs. Longest Training Example (Row 1 vs. 11)**: Padding the input to the full supported context length results is significantly worse.
 9. **Padding vs no padding (Row 1 vs. 12 and 13)**: The `--no_padding` option disables the padding in the dataset, which requires training the model with a batch size of 1 since the inputs have variable lengths. This results in a better test accuracy but takes longer to train. In row 12, we additionally enable gradient accumulation with 8 steps to achieve the same batch size as in the other experiments, which helps reduce overfitting and slightly boost the test set accuracy.
 10. **Disabling the causal attention mask (Row 1 vs. 14)**: Disables the causal attention mask used in the multi-head attention module. This means all tokens can attend all other tokens. The model accuracy is slightly improved compared to the GPT model with causal mask.
-11. **Ignoring the padding indeces in the loss and backpropagation (Row 1 vs. 15)**: Setting `--ignore_index 50256` excludes the `|endoftext|` padding tokens in the `cross_entropy` loss function in PyTorch. In this case, it does not have any effect because we replaced the output layers so that the token IDs are either 0 or 1 for the binary classification example. However, this setting is useful when instruction finetuning models in chapter 7.
+11. **Ignoring the padding indices in the loss and backpropagation (Row 1 vs. 15)**: Setting `--ignore_index 50256` excludes the `|endoftext|` padding tokens in the `cross_entropy` loss function in PyTorch. In this case, it does not have any effect because we replaced the output layers so that the token IDs are either 0 or 1 for the binary classification example. However, this setting is useful when instruction finetuning models in chapter 7.
diff --git a/ch06/03_bonus_imdb-classification/README.md b/ch06/03_bonus_imdb-classification/README.md
index 9027088..24e75f0 100644
--- a/ch06/03_bonus_imdb-classification/README.md
+++ b/ch06/03_bonus_imdb-classification/README.md
@@ -107,7 +107,7 @@ python train-bert-hf.py --bert_model roberta
 
 ---
 
-A scikit-learn Logistic Regression model as a basline.
+A scikit-learn Logistic Regression model as a baseline.
 
 ```bash
 python train-sklearn-logreg.py
diff --git a/ch07/01_main-chapter-code/ch07.ipynb b/ch07/01_main-chapter-code/ch07.ipynb
index 1e1976b..284faa3 100644
--- a/ch07/01_main-chapter-code/ch07.ipynb
+++ b/ch07/01_main-chapter-code/ch07.ipynb
@@ -2050,7 +2050,7 @@
     "  - human preference comparison to other LLMs, such as LMSYS chatbot arena ([https://arena.lmsys.org](https://arena.lmsys.org))\n",
     "  - automated conversational benchmarks, where another LLM like GPT-4 is used to evaluate the responses, such as AlpacaEval ([https://tatsu-lab.github.io/alpaca_eval/](https://tatsu-lab.github.io/alpaca_eval/))\n",
     "\n",
-    "- In the next section, we will use an approach similar to AlpaceEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n",
+    "- In the next section, we will use an approach similar to AlpacaEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n",
     "- For this, we add the model response to the `test_data` dictionary and save it as a `\"instruction-data-with-response.json\"` file for record-keeping so that we can load and analyze it in separate Python sessions if needed"
    ]
   },
@@ -2702,7 +2702,7 @@
     "## Summary and takeaways\n",
     "\n",
     "- See the [./gpt_instruction_finetuning.py](./gpt_instruction_finetuning.py) script, a self-contained script for classification finetuning\n",
-    "- [./ollama_evaluate.py](./ollama_evaluate.py) is a standalonw script based on section 7.8 that evaluates a JSON file containing \"output\" and \"response\" keys via Ollama and Llama 3\n",
+    "- [./ollama_evaluate.py](./ollama_evaluate.py) is a standalone script based on section 7.8 that evaluates a JSON file containing \"output\" and \"response\" keys via Ollama and Llama 3\n",
     "- The [./load-finetuned-model.ipynb](./load-finetuned-model.ipynb) notebook illustrates how to load the finetuned model in a new session\n",
     "- You can find the exercise solutions in [./exercise-solutions.ipynb](./exercise-solutions.ipynb)"
    ]
@@ -2730,7 +2730,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.11"
   }
  },
  "nbformat": 4,