mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-10-31 18:00:08 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			169 lines
		
	
	
		
			5.0 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			169 lines
		
	
	
		
			5.0 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "ba450fb1-8a26-4894-ab7a-5d7bfefe90ce",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "<table style=\"width:100%\">\n",
 | |
|     "<tr>\n",
 | |
|     "<td style=\"vertical-align:middle; text-align:left;\">\n",
 | |
|     "<font size=\"2\">\n",
 | |
|     "Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
 | |
|     "<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
 | |
|     "</font>\n",
 | |
|     "</td>\n",
 | |
|     "<td style=\"vertical-align:middle; text-align:left;\">\n",
 | |
|     "<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
 | |
|     "</td>\n",
 | |
|     "</tr>\n",
 | |
|     "</table>"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "51c9672d-8d0c-470d-ac2d-1271f8ec3f14",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# Chapter 6 Exercise solutions"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "5fea8be3-30a1-4623-a6d7-b095c6c1092e",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Exercise 6.1: Increasing the context length"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "5860ba9f-2db3-4480-b96b-4be1c68981eb",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "We can pad the inputs to the maximum number of tokens the model supports by setting the max length to 1024:\n",
 | |
|     "\n",
 | |
|     "```python\n",
 | |
|     "max_length = 1024\n",
 | |
|     "\n",
 | |
|     "train_dataset = SpamDataset(base_path / \"train.csv\", max_length=max_length, tokenizer=tokenizer)\n",
 | |
|     "val_dataset = SpamDataset(base_path / \"validation.csv\", max_length=max_length, tokenizer=tokenizer)\n",
 | |
|     "test_dataset = SpamDataset(base_path / \"test.csv\", max_length=max_length, tokenizer=tokenizer)\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "or, equivalently, we can define the `max_length` via:\n",
 | |
|     "\n",
 | |
|     "```python\n",
 | |
|     "max_length = model.pos_emb.weight.shape[0]\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "or\n",
 | |
|     "\n",
 | |
|     "```python\n",
 | |
|     "max_length = BASE_CONFIG[\"context_length\"]\n",
 | |
|     "```"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "2b0f4d5d-17fd-4265-93d8-ea08a22fdaf8",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "For convenience, you can run this experiment via\n",
 | |
|     "\n",
 | |
|     "```bash\n",
 | |
|     "python additional-experiments.py --context_length \"model_context_length\"\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter)."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "5a780455-f52a-48d1-ab82-6afd40bcad8b",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Exercise 6.2: Finetuning the whole model"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "56aa5208-aa29-4165-a0ec-7480754e2a18",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:\n",
 | |
|     "\n",
 | |
|     "```python\n",
 | |
|     "for param in model.parameters():\n",
 | |
|     "    param.requires_grad = False\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "For convenience, you can run this experiment via\n",
 | |
|     "\n",
 | |
|     "```bash\n",
 | |
|     "python additional-experiments.py --trainable_layers all\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter)."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "2269bce3-f2b5-4a76-a692-5977c75a57b6",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Exercise 6.3: Finetuning the first versus last token "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "7418a629-51b6-4aa2-83b7-bc0261bc370f",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Rather than finetuning the last output token, we can finetune the first output token by changing \n",
 | |
|     "\n",
 | |
|     "```python\n",
 | |
|     "model(input_batch)[:, -1, :]\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "to\n",
 | |
|     "\n",
 | |
|     "```python\n",
 | |
|     "model(input_batch)[:, 0, :]\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "everywhere in the code.\n",
 | |
|     "\n",
 | |
|     "For convenience, you can run this experiment via\n",
 | |
|     "\n",
 | |
|     "```\n",
 | |
|     "python additional-experiments.py --trainable_token first\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter)."
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3 (ipykernel)",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.10.11"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 5
 | |
| }
 | 
