diff --git a/ch06/01_main-chapter-code/ch06.ipynb b/ch06/01_main-chapter-code/ch06.ipynb
index 5f9b025..6b6bc23 100644
--- a/ch06/01_main-chapter-code/ch06.ipynb
+++ b/ch06/01_main-chapter-code/ch06.ipynb
@@ -1348,7 +1348,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": 22,
    "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7",
    "metadata": {
     "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7"
@@ -1373,7 +1373,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 23,
    "id": "f645c06a-7df6-451c-ad3f-eafb18224ebc",
    "metadata": {
     "colab": {
@@ -1409,7 +1409,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 24,
    "id": "48dc84f1-85cc-4609-9cee-94ff539f00f4",
    "metadata": {
     "colab": {
@@ -1470,7 +1470,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 25,
    "id": "49383a8c-41d5-4dab-98f1-238bca0c2ed7",
    "metadata": {
     "colab": {
@@ -1526,7 +1526,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 26,
    "id": "c77faab1-3461-4118-866a-6171f2b89aa0",
    "metadata": {},
    "outputs": [
@@ -1547,12 +1547,12 @@
    "id": "7edd71fa-628a-4d00-b81d-6d8bcb2c341d",
    "metadata": {},
    "source": [
-    "- Similar to chapter 5, we convert the outputs (logits) into probability scores via the `softmax` function and then obtain the index position of the largest probability value via the `argmax` function:"
+    "- Similar to chapter 5, we convert the outputs (logits) into probability scores via the `softmax` function and then obtain the index position of the largest probability value via the `argmax` function"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 27,
    "id": "b81efa92-9be1-4b9e-8790-ce1fc7b17f01",
    "metadata": {},
    "outputs": [
@@ -1572,12 +1572,118 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d5241f47-a1e4-4bba-8064-5d06cffa7941",
+   "id": "414a6f02-307e-4147-a416-14d115bf8179",
    "metadata": {},
    "source": [
     "- Note that the softmax function is optional here, as explained in chapter 5, because the largest outputs correspond to the largest probability scores"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "f9f9ad66-4969-4501-8239-3ccdb37e71a2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Class label: 1\n"
+     ]
+    }
+   ],
+   "source": [
+    "logits = outputs[:, -1, :]\n",
+    "label = torch.argmax(logits)\n",
+    "print(\"Class label:\", label.item())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcb20d3a-cbba-4ab1-8584-d94e16589505",
+   "metadata": {},
+   "source": [
+    "- We can apply this concept to calculate the so-called classification accuracy, which computes the percentage of correct predictions in a given dataset\n",
+    "- To calculate the classification accuracy, we can apply the preceding `argmax`-based prediction code to all examples in a dataset and calculate the fraction of correct predictions as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "3ecf9572-aed0-4a21-9c3b-7f9f2aec5f23",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def calc_accuracy_loader(data_loader, model, device, num_batches=None):\n",
+    "    model.eval()\n",
+    "    correct_predictions, num_examples = 0, 0\n",
+    "\n",
+    "    if num_batches is None:\n",
+    "        num_batches = len(data_loader)\n",
+    "    else:\n",
+    "        num_batches = min(num_batches, len(data_loader))\n",
+    "    for i, (input_batch, target_batch) in enumerate(data_loader):\n",
+    "        if i < num_batches:\n",
+    "            input_batch, target_batch = input_batch.to(device), target_batch.to(device)\n",
+    "\n",
+    "            with torch.no_grad():\n",
+    "                logits = model(input_batch)[:, -1, :]  # Logits of last output token\n",
+    "            predicted_labels = torch.argmax(logits, dim=-1)\n",
+    "\n",
+    "            num_examples += predicted_labels.shape[0]\n",
+    "            correct_predictions += (predicted_labels == target_batch).sum().item()\n",
+    "        else:\n",
+    "            break\n",
+    "    return correct_predictions / num_examples"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7165fe46-a284-410b-957f-7524877d1a1a",
+   "metadata": {},
+   "source": [
+    "- Let's apply the function to calculate the classification accuracies for the different datasets:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "390e5255-8427-488c-adef-e1c10ab4fb26",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Training accuracy: 46.25%\n",
+      "Validation accuracy: 45.00%\n",
+      "Test accuracy: 48.75%\n"
+     ]
+    }
+   ],
+   "source": [
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "model.to(device) # no assignment model = model.to(device) necessary for nn.Module classes\n",
+    "\n",
+    "torch.manual_seed(123) # For reproducibility due to the shuffling in the training data loader\n",
+    "\n",
+    "train_accuracy = calc_accuracy_loader(train_loader, model, device, num_batches=10)\n",
+    "val_accuracy = calc_accuracy_loader(val_loader, model, device, num_batches=10)\n",
+    "test_accuracy = calc_accuracy_loader(test_loader, model, device, num_batches=10)\n",
+    "\n",
+    "print(f\"Training accuracy: {train_accuracy*100:.2f}%\")\n",
+    "print(f\"Validation accuracy: {val_accuracy*100:.2f}%\")\n",
+    "print(f\"Test accuracy: {test_accuracy*100:.2f}%\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30345e2a-afed-4d22-9486-f4010f90a871",
+   "metadata": {},
+   "source": [
+    "- As we can see, the prediction accuracies are not very good, since we haven't finetuned the model, yet"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "4f4a9d15-8fc7-48a2-8734-d92a2f265328",
@@ -1585,19 +1691,14 @@
    "source": [
     "- Before we can start finetuning (/training), we first have to define the loss function we want to optimize during training\n",
     "- The goal is to maximize the spam classification accuracy of the model; however, classification accuracy is not a differentiable function\n",
-    "- Hence, instead, we minimize the cross entropy loss as a proxy for maximizing the classification accuracy (you can learn more about this topic in lecture 8 of my freely available [Introduction to Deep Learning](https://sebastianraschka.com/blog/2021/dl-course.html#l08-multinomial-logistic-regression--softmax-regression) class.\n",
+    "- Hence, instead, we minimize the cross entropy loss as a proxy for maximizing the classification accuracy (you can learn more about this topic in lecture 8 of my freely available [Introduction to Deep Learning](https://sebastianraschka.com/blog/2021/dl-course.html#l08-multinomial-logistic-regression--softmax-regression) class)\n",
     "\n",
-    "- Note that in chapter 5, we calculated the cross entropy loss for the next predicted token over the 50,257 token IDs in the vocabulary\n",
-    "- Here, we calculate the cross entropy in a similar fashion; the only difference is that instead of 50,257 token IDs, we now have only two choices: \"spam\" (label 1) or \"not spam\" (label 0).\n",
-    "- In other words, the loss calculation training code is practically identical to the one in chapter 5, but we now only have two labels instead of 50,257 labels (token IDs).\n",
-    "\n",
-    "\n",
-    "- Consequently, the `calc_loss_batch` function is the same here as in chapter 5, except that we are only interested in optimizing the last token `model(input_batch)[:, -1, :]` instead of all tokens `model(input_batch)`:"
+    "- The `calc_loss_batch` function is the same here as in chapter 5, except that we are only interested in optimizing the last token `model(input_batch)[:, -1, :]` instead of all tokens `model(input_batch)`"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 31,
    "id": "2f1e9547-806c-41a9-8aba-3b2822baabe4",
    "metadata": {
     "id": "2f1e9547-806c-41a9-8aba-3b2822baabe4"
@@ -1616,12 +1717,12 @@
    "id": "a013aab9-f854-4866-ad55-5b8350adb50a",
    "metadata": {},
    "source": [
-    "The `calc_loss_loader` is exactly the same as in chapter 5:"
+    "The `calc_loss_loader` is exactly the same as in chapter 5"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 32,
    "id": "b7b83e10-5720-45e7-ac5e-369417ca846b",
    "metadata": {},
    "outputs": [],
@@ -1651,14 +1752,12 @@
    "id": "56826ecd-6e74-40e6-b772-d3541e585067",
    "metadata": {},
    "source": [
-    "- Using the `calc_closs_loader`, we compute the initial training, validation, and test set losses before we start training\n",
-    "- Here, we use `torch.no_grad()` so that no gradients are computed during the forward pass, which reduces memory consumption and speeds up computations since we are not training the model yet\n",
-    "- Via the `device` setting, the  model automatically runs on a GPU if a GPU with Nvidia CUDA support is available and otherwise runs on a CPU"
+    "- Using the `calc_closs_loader`, we compute the initial training, validation, and test set losses before we start training"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 33,
    "id": "f6f00e53-5beb-4e64-b147-f26fd481c6ff",
    "metadata": {
     "colab": {
@@ -1672,18 +1771,13 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Training loss: 3.095\n",
+      "Training loss: 2.453\n",
       "Validation loss: 2.583\n",
       "Test loss: 2.322\n"
      ]
     }
    ],
    "source": [
-    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
-    "model.to(device) # no assignment model = model.to(device) necessary for nn.Module classes\n",
-    "\n",
-    "torch.manual_seed(123) # For reproducibility due to the shuffling in the training data loader\n",
-    "\n",
     "with torch.no_grad(): # Disable gradient tracking for efficiency because we are not training, yet\n",
     "    train_loss = calc_loss_loader(train_loader, model, device, num_batches=5)\n",
     "    val_loss = calc_loss_loader(val_loader, model, device, num_batches=5)\n",
@@ -1694,93 +1788,12 @@
     "print(f\"Test loss: {test_loss:.3f}\")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "b109556e-ddae-49fd-ad08-e6fa1032ea7a",
-   "metadata": {},
-   "source": [
-    "- Similar to the `calc_loss_loader` function above, we can define a `calc_accuracy_loader` function that calculates the classification accuracy by checking how many predicted class (spam and ham) labels match the given labels in the dataset\n",
-    "- Note that the classification accuracy is a mathematically non-differentiable function, and we only use it for evaluation; hence, we can disable the gradient calculation permanently to save resources here\n",
-    "- We can disable the gradient tracking either using the `with torch.no_grad():` inside the function or by using the `@torch.no_grad()` function decorator"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "64ce5b12-84cd-488c-8ea7-4cef5b2d947e",
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "64ce5b12-84cd-488c-8ea7-4cef5b2d947e",
-    "outputId": "239581b4-fd0f-4adf-e67b-364e0f0f96b7"
-   },
-   "outputs": [],
-   "source": [
-    "@torch.no_grad() # Disable gradient tracking for efficiency\n",
-    "def calc_accuracy_loader(data_loader, model, device, num_batches=None):\n",
-    "    model.eval()\n",
-    "    correct_predictions, num_examples = 0, 0\n",
-    "\n",
-    "    if num_batches is None:\n",
-    "        num_batches = len(data_loader)\n",
-    "    else:\n",
-    "        num_batches = min(num_batches, len(data_loader))\n",
-    "    for i, (input_batch, target_batch) in enumerate(data_loader):\n",
-    "        if i < num_batches:\n",
-    "            input_batch, target_batch = input_batch.to(device), target_batch.to(device)\n",
-    "            logits = model(input_batch)[:, -1, :]  # Logits of last output token\n",
-    "            predicted_labels = torch.argmax(logits, dim=-1)\n",
-    "\n",
-    "            num_examples += predicted_labels.shape[0]\n",
-    "            correct_predictions += (predicted_labels == target_batch).sum().item()\n",
-    "        else:\n",
-    "            break\n",
-    "    return correct_predictions / num_examples"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "90521a9a-639c-4c7f-a5c0-aca8fa5d4c1b",
-   "metadata": {},
-   "source": [
-    "- Let's check the initial classification accuracy before we start training the model"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "id": "2160418f-988b-40f3-bce8-e431021e97dc",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Training accuracy: 46.25%\n",
-      "Validation accuracy: 45.00%\n",
-      "Test accuracy: 48.75%\n"
-     ]
-    }
-   ],
-   "source": [
-    "torch.manual_seed(123)\n",
-    "train_accuracy = calc_accuracy_loader(train_loader, model, device, num_batches=10)\n",
-    "val_accuracy = calc_accuracy_loader(val_loader, model, device, num_batches=10)\n",
-    "test_accuracy = calc_accuracy_loader(test_loader, model, device, num_batches=10)\n",
-    "\n",
-    "print(f\"Training accuracy: {train_accuracy*100:.2f}%\")\n",
-    "print(f\"Validation accuracy: {val_accuracy*100:.2f}%\")\n",
-    "print(f\"Test accuracy: {test_accuracy*100:.2f}%\")"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "e04b980b-e583-4f62-84a0-4edafaf99d5d",
    "metadata": {},
    "source": [
-    "- As we can see, the model only gets roughly half (50%) of the predictions correctly\n",
-    "- In the next section, we train the model to improve the classification accuracy"
+    "- In the next section, we train the model to improve the loss values and consequently the classification accuracy"
    ]
   },
   {