mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-08-15 04:01:44 +00:00
fixed spelling typos (#258)
This commit is contained in:
parent
78b783f6fd
commit
90b25ece3d
@ -65,4 +65,4 @@ I've kept the LLM and dataset small on purpose, so you can run the training on a
|
||||
8. **Padding Input to Full Context Length vs. Longest Training Example (Row 1 vs. 11)**: Padding the input to the full supported context length results is significantly worse.
|
||||
9. **Padding vs no padding (Row 1 vs. 12 and 13)**: The `--no_padding` option disables the padding in the dataset, which requires training the model with a batch size of 1 since the inputs have variable lengths. This results in a better test accuracy but takes longer to train. In row 12, we additionally enable gradient accumulation with 8 steps to achieve the same batch size as in the other experiments, which helps reduce overfitting and slightly boost the test set accuracy.
|
||||
10. **Disabling the causal attention mask (Row 1 vs. 14)**: Disables the causal attention mask used in the multi-head attention module. This means all tokens can attend all other tokens. The model accuracy is slightly improved compared to the GPT model with causal mask.
|
||||
11. **Ignoring the padding indeces in the loss and backpropagation (Row 1 vs. 15)**: Setting `--ignore_index 50256` excludes the `|endoftext|` padding tokens in the `cross_entropy` loss function in PyTorch. In this case, it does not have any effect because we replaced the output layers so that the token IDs are either 0 or 1 for the binary classification example. However, this setting is useful when instruction finetuning models in chapter 7.
|
||||
11. **Ignoring the padding indices in the loss and backpropagation (Row 1 vs. 15)**: Setting `--ignore_index 50256` excludes the `|endoftext|` padding tokens in the `cross_entropy` loss function in PyTorch. In this case, it does not have any effect because we replaced the output layers so that the token IDs are either 0 or 1 for the binary classification example. However, this setting is useful when instruction finetuning models in chapter 7.
|
||||
|
@ -107,7 +107,7 @@ python train-bert-hf.py --bert_model roberta
|
||||
|
||||
---
|
||||
|
||||
A scikit-learn Logistic Regression model as a basline.
|
||||
A scikit-learn Logistic Regression model as a baseline.
|
||||
|
||||
```bash
|
||||
python train-sklearn-logreg.py
|
||||
|
@ -2050,7 +2050,7 @@
|
||||
" - human preference comparison to other LLMs, such as LMSYS chatbot arena ([https://arena.lmsys.org](https://arena.lmsys.org))\n",
|
||||
" - automated conversational benchmarks, where another LLM like GPT-4 is used to evaluate the responses, such as AlpacaEval ([https://tatsu-lab.github.io/alpaca_eval/](https://tatsu-lab.github.io/alpaca_eval/))\n",
|
||||
"\n",
|
||||
"- In the next section, we will use an approach similar to AlpaceEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n",
|
||||
"- In the next section, we will use an approach similar to AlpacaEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n",
|
||||
"- For this, we add the model response to the `test_data` dictionary and save it as a `\"instruction-data-with-response.json\"` file for record-keeping so that we can load and analyze it in separate Python sessions if needed"
|
||||
]
|
||||
},
|
||||
@ -2702,7 +2702,7 @@
|
||||
"## Summary and takeaways\n",
|
||||
"\n",
|
||||
"- See the [./gpt_instruction_finetuning.py](./gpt_instruction_finetuning.py) script, a self-contained script for classification finetuning\n",
|
||||
"- [./ollama_evaluate.py](./ollama_evaluate.py) is a standalonw script based on section 7.8 that evaluates a JSON file containing \"output\" and \"response\" keys via Ollama and Llama 3\n",
|
||||
"- [./ollama_evaluate.py](./ollama_evaluate.py) is a standalone script based on section 7.8 that evaluates a JSON file containing \"output\" and \"response\" keys via Ollama and Llama 3\n",
|
||||
"- The [./load-finetuned-model.ipynb](./load-finetuned-model.ipynb) notebook illustrates how to load the finetuned model in a new session\n",
|
||||
"- You can find the exercise solutions in [./exercise-solutions.ipynb](./exercise-solutions.ipynb)"
|
||||
]
|
||||
@ -2730,7 +2730,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
x
Reference in New Issue
Block a user