"Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter)."
]
},
{
"cell_type": "markdown",
"id": "5a780455-f52a-48d1-ab82-6afd40bcad8b",
"metadata": {},
"source": [
"## Exercise 6.2: Finetuning the whole model"
]
},
{
"cell_type": "markdown",
"id": "56aa5208-aa29-4165-a0ec-7480754e2a18",
"metadata": {},
"source": [
"Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:\n",
"\n",
"```python\n",
"for param in model.parameters():\n",
" param.requires_grad = False\n",
"```\n",
"\n",
"For convenience, you can run this experiment via\n",
"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter)."
]
},
{
"cell_type": "markdown",
"id": "2269bce3-f2b5-4a76-a692-5977c75a57b6",
"metadata": {},
"source": [
"## Exercise 6.3: Finetuning the first versus last token "
"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter)."