LLMs-from-scratch/ch06/01_main-chapter-code/exercise-solutions.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ba450fb1-8a26-4894-ab7a-5d7bfefe90ce",
   "metadata": {},
   "source": [
    "<font size=\"1\">\n",
    "Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
    "</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51c9672d-8d0c-470d-ac2d-1271f8ec3f14",
   "metadata": {},
   "source": [
    "# Chapter 6 Exercise solutions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fea8be3-30a1-4623-a6d7-b095c6c1092e",
   "metadata": {},
   "source": [
    "## Exercise 6.1: Increasing the context length"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5860ba9f-2db3-4480-b96b-4be1c68981eb",
   "metadata": {},
   "source": [
    "We can pad the inputs to the maximum number of tokens the model supports by setting the max length to 1024:\n",
    "\n",
    "```python\n",
    "max_length = 1024\n",
    "\n",
    "train_dataset = SpamDataset(base_path / \"train.csv\", max_length=max_length, tokenizer=tokenizer)\n",
    "val_dataset = SpamDataset(base_path / \"validation.csv\", max_length=max_length, tokenizer=tokenizer)\n",
    "test_dataset = SpamDataset(base_path / \"test.csv\", max_length=max_length, tokenizer=tokenizer)\n",
    "\n",
    "```\n",
    "\n",
    "or, equivalently, we can define the `max_length` via:\n",
    "\n",
    "```python\n",
    "max_length = model.pos_emb.weight.shape[0]\n",
    "```\n",
    "\n",
    "or\n",
    "\n",
    "```python\n",
    "max_length = BASE_CONFIG[\"context_length\"]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b0f4d5d-17fd-4265-93d8-ea08a22fdaf8",
   "metadata": {},
   "source": [
    "For convenience, you can run this experiment via\n",
    "\n",
    "```\n",
    "python additional-experiments.py --context_length \"model_context_length\"\n",
    "```\n",
    "\n",
    "using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a780455-f52a-48d1-ab82-6afd40bcad8b",
   "metadata": {},
   "source": [
    "## Exercise 6.2: Finetuning the whole model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56aa5208-aa29-4165-a0ec-7480754e2a18",
   "metadata": {},
   "source": [
    "Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:\n",
    "\n",
    "```python\n",
    "for param in model.parameters():\n",
    "    param.requires_grad = False\n",
    "```\n",
    "\n",
    "For convenience, you can run this experiment via\n",
    "\n",
    "```\n",
    "python additional-experiments.py --trainable_layers all\n",
    "```\n",
    "\n",
    "using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2269bce3-f2b5-4a76-a692-5977c75a57b6",
   "metadata": {},
   "source": [
    "## Exercise 6.3: Finetuning the first versus last token "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7418a629-51b6-4aa2-83b7-bc0261bc370f",
   "metadata": {},
   "source": [
    "Rather than finetuning the last output token, we can finetune the first output token by changing \n",
    "\n",
    "```python\n",
    "model(input_batch)[:, -1, :]\n",
    "```\n",
    "\n",
    "to\n",
    "\n",
    "```python\n",
    "model(input_batch)[:, 0, :]\n",
    "```\n",
    "\n",
    "everywhere in the code.\n",
    "\n",
    "For convenience, you can run this experiment via\n",
    "\n",
    "```\n",
    "python additional-experiments.py --trainable_token first\n",
    "```\n",
    "\n",
    "using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5e6188a-f182-4f26-b9e5-ccae3ecadae0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
tests and exercises 2024-05-13 07:45:59 -05:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"id": "ba450fb1-8a26-4894-ab7a-5d7bfefe90ce",`
			`"metadata": {},`
			`"source": [`
			`"<font size=\"1\">\n",`
			`"Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",`
			`"Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",`
			`"</font>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "51c9672d-8d0c-470d-ac2d-1271f8ec3f14",`
			`"metadata": {},`
			`"source": [`
			`"# Chapter 6 Exercise solutions"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "5fea8be3-30a1-4623-a6d7-b095c6c1092e",`
			`"metadata": {},`
			`"source": [`
			`"## Exercise 6.1: Increasing the context length"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "5860ba9f-2db3-4480-b96b-4be1c68981eb",`
			`"metadata": {},`
			`"source": [`
spelling 2024-05-13 20:06:38 -05:00			`"We can pad the inputs to the maximum number of tokens the model supports by setting the max length to 1024:\n",`
tests and exercises 2024-05-13 07:45:59 -05:00			`"\n",`
			"```python\n",
			`"max_length = 1024\n",`
			`"\n",`
			`"train_dataset = SpamDataset(base_path / \"train.csv\", max_length=max_length, tokenizer=tokenizer)\n",`
			`"val_dataset = SpamDataset(base_path / \"validation.csv\", max_length=max_length, tokenizer=tokenizer)\n",`
			`"test_dataset = SpamDataset(base_path / \"test.csv\", max_length=max_length, tokenizer=tokenizer)\n",`
			`"\n",`
			"```\n",
			`"\n",`
			"or, equivalently, we can define the `max_length` via:\n",
			`"\n",`
			"```python\n",
			`"max_length = model.pos_emb.weight.shape[0]\n",`
			"```\n",
			`"\n",`
			`"or\n",`
			`"\n",`
			"```python\n",
			`"max_length = BASE_CONFIG[\"context_length\"]\n",`
			"```"
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "2b0f4d5d-17fd-4265-93d8-ea08a22fdaf8",`
			`"metadata": {},`
			`"source": [`
			`"For convenience, you can run this experiment via\n",`
			`"\n",`
			"```\n",
			`"python additional-experiments.py --context_length \"model_context_length\"\n",`
			"```\n",
			`"\n",`
			`"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter)."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "5a780455-f52a-48d1-ab82-6afd40bcad8b",`
			`"metadata": {},`
			`"source": [`
			`"## Exercise 6.2: Finetuning the whole model"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "56aa5208-aa29-4165-a0ec-7480754e2a18",`
			`"metadata": {},`
			`"source": [`
			`"Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:\n",`
			`"\n",`
			"```python\n",
			`"for param in model.parameters():\n",`
			`" param.requires_grad = False\n",`
			"```\n",
			`"\n",`
			`"For convenience, you can run this experiment via\n",`
			`"\n",`
			"```\n",
			`"python additional-experiments.py --trainable_layers all\n",`
			"```\n",
			`"\n",`
			`"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter)."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "2269bce3-f2b5-4a76-a692-5977c75a57b6",`
			`"metadata": {},`
			`"source": [`
			`"## Exercise 6.3: Finetuning the first versus last token "`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "7418a629-51b6-4aa2-83b7-bc0261bc370f",`
			`"metadata": {},`
			`"source": [`
spelling 2024-05-13 20:06:38 -05:00			`"Rather than finetuning the last output token, we can finetune the first output token by changing \n",`
tests and exercises 2024-05-13 07:45:59 -05:00			`"\n",`
			"```python\n",
			`"model(input_batch)[:, -1, :]\n",`
			"```\n",
			`"\n",`
			`"to\n",`
			`"\n",`
			"```python\n",
			`"model(input_batch)[:, 0, :]\n",`
			"```\n",
			`"\n",`
			`"everywhere in the code.\n",`
			`"\n",`
			`"For convenience, you can run this experiment via\n",`
			`"\n",`
			"```\n",
			`"python additional-experiments.py --trainable_token first\n",`
			"```\n",
			`"\n",`
			`"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter)."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "e5e6188a-f182-4f26-b9e5-ccae3ecadae0",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.10.6"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`