Add figures for ch06 (#141)

2025-11-02 19:00:14 +00:00 · 2024-05-05 07:10:04 -05:00 · 2024-05-05 07:10:04 -05:00 · c6fcadb087
commit c6fcadb087
parent f917fc76fe
1 changed files with 246 additions and 50 deletions
--- a/ch06/01_main-chapter-code/ch06.ipynb
+++ b/ch06/01_main-chapter-code/ch06.ipynb
@ -25,7 +25,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
   "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90",
   "metadata": {
    "colab": {
@ -62,6 +62,14 @@
    "    print(f\"{p} version: {version(p)}\")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "a445828a-ff10-4efa-9f60-a2e2aed4c87d",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/chapter-overview.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "3a84cf35-b37f-4c15-8972-dfafc9fadc1c",
@ -82,6 +90,42 @@
    "- No code in this section"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "ac45579d-d485-47dc-829e-43be7f4db57b",
+   "metadata": {},
+   "source": [
+    "- The most common ways to finetune language models are instruction-finetuning and classifcation finetuning\n",
+    "- Instruction-finetuning, depicted below, is the topic of the next chapter"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c29ef42-46d9-43d4-8bb4-94974e1665e4",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/instructions.webp\" width=500px>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7f60321-95b8-46a9-97bf-1d07fda2c3dd",
+   "metadata": {},
+   "source": [
+    "- Classification finetuning, the topic of this chapter, is a procedure you may already be familiar with if you have a background in machine learning -- it's similar to training a convolutional network to classify handwritten digits, for example\n",
+    "- In classification finetuning, we have a specific number of class labels (for example, \"spam\" and \"not spam\") that the model can output\n",
+    "- A classification finetuned model can only predict classes it has seen during training (for example, \"spam\" or \"not spam\", whereas an instruction-finetuned model can usually perform many tasks\n",
+    "- We can think of a classification-finetuned model as a very specialized model; in practice, it is much easier to create a specialized model than a generalist model that performs well on many different tasks"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b37a0c4-0bb1-4061-b1fe-eaa4416d52c3",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/spam-non-spam.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf",
@ -92,6 +136,14 @@
    "## 6.2 Preparing the dataset"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "5f628975-d2e8-4f7f-ab38-92bb868b7067",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-1.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "9fbd459f-63fa-4d8c-8499-e23103156c7d",
@ -106,7 +158,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
   "id": "def7c09b-af9c-4216-90ce-5e67aed1065c",
   "metadata": {
    "colab": {
@ -169,7 +221,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
   "id": "da0ed4da-ac31-4e4d-8bdd-2153be4656a4",
   "metadata": {
    "colab": {
@ -283,7 +335,7 @@
       "[5572 rows x 2 columns]"
      ]
     },
-     "execution_count": 4,
+     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -307,7 +359,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
   "id": "495a5280-9d7c-41d4-9719-64ab99056d4c",
   "metadata": {
    "colab": {
@ -345,7 +397,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 5,
   "id": "7be4a0a2-9704-4a96-b38f-240339818688",
   "metadata": {
    "colab": {
@ -396,7 +448,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 6,
   "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd",
   "metadata": {
    "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd"
@ -418,7 +470,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 7,
   "id": "uQl0Psdmx15D",
   "metadata": {
    "id": "uQl0Psdmx15D"
@ -448,6 +500,14 @@
    "test_df.to_csv(\"test.csv\", index=None)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d7a0c5-1d5f-458a-b685-3f49520b0094",
+   "metadata": {},
+   "source": [
+    "## 6.3 Creating data loaders"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "7126108a-75e7-4862-b0fb-cbf59a18bb6c",
@ -465,7 +525,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 8,
   "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7",
   "metadata": {
    "colab": {
@ -490,6 +550,27 @@
    "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "0ff0f6b2-376b-4740-8858-55b60784be73",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[42, 13, 314, 481, 1908, 340, 757]"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "tokenizer.encode(\"K. I will sent it again\")"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "04f582ff-68bf-450e-bd87-5fb61afe431c",
@ -500,6 +581,14 @@
    "- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "0829f33f-1428-4f22-9886-7fee633b3666",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 10,
@ -611,6 +700,14 @@
    "- Next, we use the dataset to instantiate the data loaders, which is similar to creating the data loaders in previous chapters:"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "64bcc349-205f-48f8-9655-95ff21f5e72f",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/batch.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 13,
@ -730,7 +827,7 @@
    "id": "d1c4f61a-5f5d-4b3b-97cf-151b617d1d6c"
   },
   "source": [
-    "## 6.3 Initializing a model with pretrained weights"
+    "## 6.4 Initializing a model with pretrained weights"
   ]
  },
  {
@ -738,7 +835,9 @@
   "id": "97e1af8b-8bd1-4b44-8b8b-dc031496e208",
   "metadata": {},
   "source": [
-    "- In this section, we initialize the pretrained model we worked with in the previous chapter"
+    "- In this section, we initialize the pretrained model we worked with in the previous chapter\n",
+    "\n",
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-2.webp\" width=500px>"
   ]
  },
  {
@ -819,43 +918,86 @@
  {
   "cell_type": "code",
   "execution_count": 18,
-   "id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c",
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c",
-    "outputId": "8ff3ec54-1dc3-4930-9be6-8eeaf560f8d4"
-   },
+   "id": "d8ac25ff-74b1-4149-8dc5-4c429d464330",
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Output text: Every effort moves you forward.\n",
+      "Every effort moves you forward.\n",
      "\n",
      "The first step is to understand the importance of your work\n"
     ]
    }
   ],
   "source": [
-    "from previous_chapters import generate_text_simple\n",
+    "from previous_chapters import (\n",
+    "    generate_text_simple,\n",
+    "    text_to_token_ids,\n",
+    "    token_ids_to_text\n",
+    ")\n",
    "\n",
-    "start_context = \"Every effort moves you\"\n",
    "\n",
-    "tokenizer = tiktoken.get_encoding(\"gpt2\")\n",
-    "encoded = tokenizer.encode(start_context)\n",
-    "encoded_tensor = torch.tensor(encoded).unsqueeze(0)\n",
+    "text_1 = \"Every effort moves you\"\n",
    "\n",
-    "out = generate_text_simple(\n",
+    "token_ids = generate_text_simple(\n",
    "    model=model,\n",
-    "    idx=encoded_tensor,\n",
+    "    idx=text_to_token_ids(text_1, tokenizer),\n",
    "    max_new_tokens=15,\n",
    "    context_size=BASE_CONFIG[\"context_length\"]\n",
    ")\n",
-    "decoded_text = tokenizer.decode(out.squeeze(0).tolist())\n",
    "\n",
-    "print(\"Output text:\", decoded_text)"
+    "print(token_ids_to_text(token_ids, tokenizer))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69162550-6a02-4ece-8db1-06c71d61946f",
+   "metadata": {},
+   "source": [
+    "- Before we finetune the model as a classifier, let's see if the model can perhaps already classify spam messages via prompting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "94224aa9-c95a-4f8a-a420-76d01e3a800c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Is the following text 'spam'? Answer with 'yes' or 'no': 'You are a winner you have been specially selected to receive $1000 cash or a $2000 award.' Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes'\n"
+     ]
+    }
+   ],
+   "source": [
+    "text_2 = (\n",
+    "    \"Is the following text 'spam'? Answer with 'yes' or 'no':\"\n",
+    "    \" 'You are a winner you have been specially\"\n",
+    "    \" selected to receive $1000 cash or a $2000 award.'\"\n",
+    "    \" Answer with 'yes' or 'no'.\"\n",
+    ")\n",
+    "\n",
+    "token_ids = generate_text_simple(\n",
+    "    model=model,\n",
+    "    idx=text_to_token_ids(text_2, tokenizer),\n",
+    "    max_new_tokens=23,\n",
+    "    context_size=BASE_CONFIG[\"context_length\"]\n",
+    ")\n",
+    "\n",
+    "print(token_ids_to_text(token_ids, tokenizer))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ce39ed0-2c77-410d-8392-dd15d4b22016",
+   "metadata": {},
+   "source": [
+    "- As we can see, the model is not very good at following instruction\n",
+    "- This is expected, since it has only been pretrained and not instruction-finetuned (instruction finetuning will be covered in the next chapter)"
   ]
  },
  {
@ -865,7 +1007,15 @@
    "id": "4c9ae440-32f9-412f-96cf-fd52cc3e2522"
   },
   "source": [
-    "## 6.4 Adding a classification head"
+    "## 6.5 Adding a classification head"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6e9d66f-76b2-40fc-9ec5-3f972a8db9c0",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/lm-head.webp\" width=500px>"
   ]
  },
  {
@ -879,7 +1029,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 20,
   "id": "b23aff91-6bd0-48da-88f6-353657e6c981",
   "metadata": {
    "colab": {
@ -1149,7 +1299,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 21,
   "id": "fkMWFl-0etea",
   "metadata": {
    "id": "fkMWFl-0etea"
@ -1171,7 +1321,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 22,
   "id": "7e759fa0-0f69-41be-b576-17e5f20e04cb",
   "metadata": {},
   "outputs": [],
@ -1192,9 +1342,17 @@
    "- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "0be7c1eb-c46c-4065-8525-eea1b8c66d10",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/trainable.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 23,
   "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7",
   "metadata": {
    "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7"
@ -1219,7 +1377,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 24,
   "id": "f645c06a-7df6-451c-ad3f-eafb18224ebc",
   "metadata": {
    "colab": {
@ -1233,13 +1391,13 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Inputs: tensor([[  40, 1107, 8288,  428, 3807,   13]])\n",
-      "Inputs dimensions: torch.Size([1, 6])\n"
+      "Inputs: tensor([[5211,  345,  423,  640]])\n",
+      "Inputs dimensions: torch.Size([1, 4])\n"
     ]
    }
   ],
   "source": [
-    "inputs = tokenizer.encode(\"I really liked this movie.\")\n",
+    "inputs = tokenizer.encode(\"Do you have time\")\n",
    "inputs = torch.tensor(inputs).unsqueeze(0)\n",
    "print(\"Inputs:\", inputs)\n",
    "print(\"Inputs dimensions:\", inputs.shape) # shape: (batch_size, num_tokens)"
@ -1255,7 +1413,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 25,
   "id": "48dc84f1-85cc-4609-9cee-94ff539f00f4",
   "metadata": {
    "colab": {
@ -1270,13 +1428,11 @@
     "output_type": "stream",
     "text": [
      "Outputs:\n",
-      " tensor([[[-1.9044,  1.5321],\n",
-      "         [-4.9851,  8.5136],\n",
-      "         [-1.6985,  4.6314],\n",
-      "         [-2.3820,  5.7547],\n",
-      "         [-3.8736,  4.4867],\n",
-      "         [-5.7543,  5.3615]]])\n",
-      "Outputs dimensions: torch.Size([1, 6, 2])\n"
+      " tensor([[[-1.5854,  0.9904],\n",
+      "         [-3.7235,  7.4548],\n",
+      "         [-2.2661,  6.6049],\n",
+      "         [-3.5983,  3.9902]]])\n",
+      "Outputs dimensions: torch.Size([1, 4, 2])\n"
     ]
    }
   ],
@ -1288,6 +1444,14 @@
    "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/input-and-output.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a",
@ -1325,12 +1489,28 @@
    "print(\"Last output token:\", outputs[:, -1, :])"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "8df08ae0-e664-4670-b7c5-8a2280d9b41b",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/attention-mask.webp\" width=200px>"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "32aa4aef-e1e9-491b-9adf-5aa973e59b8c",
   "metadata": {},
   "source": [
-    "## 6.5 Calculating the classification loss and accuracy"
+    "## 6.6 Calculating the classification loss and accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "669e1fd1-ace8-44b4-b438-185ed0ba8b33",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-3.webp\" width=500px>"
   ]
  },
  {
@ -1545,7 +1725,7 @@
    "id": "456ae0fd-6261-42b4-ab6a-d24289953083"
   },
   "source": [
-    "## 6.6 Finetuning the model on supervised data"
+    "## 6.7 Finetuning the model on supervised data"
   ]
  },
  {
@ -1560,6 +1740,14 @@
    "  2. calculate the accuracy after each epoch instead of printing a sample text after each epoch"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "979b6222-1dc2-4530-9d01-b6b04fe3de12",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/training-loop.webp\" width=500px>"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 31,
@ -1868,7 +2056,15 @@
   "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0",
   "metadata": {},
   "source": [
-    "## 6.7 Using the LLM as a SPAM classifier"
+    "## 6.8 Using the LLM as a SPAM classifier"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72ebcfa2-479e-408b-9cf0-7421f6144855",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-4.webp\" width=500px>"
   ]
  },
  {
@ -2069,7 +2265,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.4"
  }
 },
 "nbformat": 4,