mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-11-02 19:00:14 +00:00 
			
		
		
		
	Add figures for ch06 (#141)
This commit is contained in:
		
							parent
							
								
									f917fc76fe
								
							
						
					
					
						commit
						c6fcadb087
					
				@ -25,7 +25,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 2,
 | 
			
		||||
   "execution_count": 1,
 | 
			
		||||
   "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -62,6 +62,14 @@
 | 
			
		||||
    "    print(f\"{p} version: {version(p)}\")"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "a445828a-ff10-4efa-9f60-a2e2aed4c87d",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/chapter-overview.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "3a84cf35-b37f-4c15-8972-dfafc9fadc1c",
 | 
			
		||||
@ -82,6 +90,42 @@
 | 
			
		||||
    "- No code in this section"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "ac45579d-d485-47dc-829e-43be7f4db57b",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "- The most common ways to finetune language models are instruction-finetuning and classifcation finetuning\n",
 | 
			
		||||
    "- Instruction-finetuning, depicted below, is the topic of the next chapter"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "6c29ef42-46d9-43d4-8bb4-94974e1665e4",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/instructions.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "a7f60321-95b8-46a9-97bf-1d07fda2c3dd",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "- Classification finetuning, the topic of this chapter, is a procedure you may already be familiar with if you have a background in machine learning -- it's similar to training a convolutional network to classify handwritten digits, for example\n",
 | 
			
		||||
    "- In classification finetuning, we have a specific number of class labels (for example, \"spam\" and \"not spam\") that the model can output\n",
 | 
			
		||||
    "- A classification finetuned model can only predict classes it has seen during training (for example, \"spam\" or \"not spam\", whereas an instruction-finetuned model can usually perform many tasks\n",
 | 
			
		||||
    "- We can think of a classification-finetuned model as a very specialized model; in practice, it is much easier to create a specialized model than a generalist model that performs well on many different tasks"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "0b37a0c4-0bb1-4061-b1fe-eaa4416d52c3",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/spam-non-spam.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf",
 | 
			
		||||
@ -92,6 +136,14 @@
 | 
			
		||||
    "## 6.2 Preparing the dataset"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "5f628975-d2e8-4f7f-ab38-92bb868b7067",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-1.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "9fbd459f-63fa-4d8c-8499-e23103156c7d",
 | 
			
		||||
@ -106,7 +158,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 3,
 | 
			
		||||
   "execution_count": 2,
 | 
			
		||||
   "id": "def7c09b-af9c-4216-90ce-5e67aed1065c",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -169,7 +221,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 4,
 | 
			
		||||
   "execution_count": 3,
 | 
			
		||||
   "id": "da0ed4da-ac31-4e4d-8bdd-2153be4656a4",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -283,7 +335,7 @@
 | 
			
		||||
       "[5572 rows x 2 columns]"
 | 
			
		||||
      ]
 | 
			
		||||
     },
 | 
			
		||||
     "execution_count": 4,
 | 
			
		||||
     "execution_count": 3,
 | 
			
		||||
     "metadata": {},
 | 
			
		||||
     "output_type": "execute_result"
 | 
			
		||||
    }
 | 
			
		||||
@ -307,7 +359,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 5,
 | 
			
		||||
   "execution_count": 4,
 | 
			
		||||
   "id": "495a5280-9d7c-41d4-9719-64ab99056d4c",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -345,7 +397,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 6,
 | 
			
		||||
   "execution_count": 5,
 | 
			
		||||
   "id": "7be4a0a2-9704-4a96-b38f-240339818688",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -396,7 +448,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 7,
 | 
			
		||||
   "execution_count": 6,
 | 
			
		||||
   "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd"
 | 
			
		||||
@ -418,7 +470,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 8,
 | 
			
		||||
   "execution_count": 7,
 | 
			
		||||
   "id": "uQl0Psdmx15D",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "id": "uQl0Psdmx15D"
 | 
			
		||||
@ -448,6 +500,14 @@
 | 
			
		||||
    "test_df.to_csv(\"test.csv\", index=None)"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "a8d7a0c5-1d5f-458a-b685-3f49520b0094",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## 6.3 Creating data loaders"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "7126108a-75e7-4862-b0fb-cbf59a18bb6c",
 | 
			
		||||
@ -465,7 +525,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 9,
 | 
			
		||||
   "execution_count": 8,
 | 
			
		||||
   "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -490,6 +550,27 @@
 | 
			
		||||
    "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 9,
 | 
			
		||||
   "id": "0ff0f6b2-376b-4740-8858-55b60784be73",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [
 | 
			
		||||
    {
 | 
			
		||||
     "data": {
 | 
			
		||||
      "text/plain": [
 | 
			
		||||
       "[42, 13, 314, 481, 1908, 340, 757]"
 | 
			
		||||
      ]
 | 
			
		||||
     },
 | 
			
		||||
     "execution_count": 9,
 | 
			
		||||
     "metadata": {},
 | 
			
		||||
     "output_type": "execute_result"
 | 
			
		||||
    }
 | 
			
		||||
   ],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "tokenizer.encode(\"K. I will sent it again\")"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "04f582ff-68bf-450e-bd87-5fb61afe431c",
 | 
			
		||||
@ -500,6 +581,14 @@
 | 
			
		||||
    "- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "0829f33f-1428-4f22-9886-7fee633b3666",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 10,
 | 
			
		||||
@ -611,6 +700,14 @@
 | 
			
		||||
    "- Next, we use the dataset to instantiate the data loaders, which is similar to creating the data loaders in previous chapters:"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "64bcc349-205f-48f8-9655-95ff21f5e72f",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/batch.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 13,
 | 
			
		||||
@ -730,7 +827,7 @@
 | 
			
		||||
    "id": "d1c4f61a-5f5d-4b3b-97cf-151b617d1d6c"
 | 
			
		||||
   },
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## 6.3 Initializing a model with pretrained weights"
 | 
			
		||||
    "## 6.4 Initializing a model with pretrained weights"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -738,7 +835,9 @@
 | 
			
		||||
   "id": "97e1af8b-8bd1-4b44-8b8b-dc031496e208",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "- In this section, we initialize the pretrained model we worked with in the previous chapter"
 | 
			
		||||
    "- In this section, we initialize the pretrained model we worked with in the previous chapter\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-2.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -819,43 +918,86 @@
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 18,
 | 
			
		||||
   "id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
     "base_uri": "https://localhost:8080/"
 | 
			
		||||
    },
 | 
			
		||||
    "id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c",
 | 
			
		||||
    "outputId": "8ff3ec54-1dc3-4930-9be6-8eeaf560f8d4"
 | 
			
		||||
   },
 | 
			
		||||
   "id": "d8ac25ff-74b1-4149-8dc5-4c429d464330",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [
 | 
			
		||||
    {
 | 
			
		||||
     "name": "stdout",
 | 
			
		||||
     "output_type": "stream",
 | 
			
		||||
     "text": [
 | 
			
		||||
      "Output text: Every effort moves you forward.\n",
 | 
			
		||||
      "Every effort moves you forward.\n",
 | 
			
		||||
      "\n",
 | 
			
		||||
      "The first step is to understand the importance of your work\n"
 | 
			
		||||
     ]
 | 
			
		||||
    }
 | 
			
		||||
   ],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "from previous_chapters import generate_text_simple\n",
 | 
			
		||||
    "from previous_chapters import (\n",
 | 
			
		||||
    "    generate_text_simple,\n",
 | 
			
		||||
    "    text_to_token_ids,\n",
 | 
			
		||||
    "    token_ids_to_text\n",
 | 
			
		||||
    ")\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "start_context = \"Every effort moves you\"\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "tokenizer = tiktoken.get_encoding(\"gpt2\")\n",
 | 
			
		||||
    "encoded = tokenizer.encode(start_context)\n",
 | 
			
		||||
    "encoded_tensor = torch.tensor(encoded).unsqueeze(0)\n",
 | 
			
		||||
    "text_1 = \"Every effort moves you\"\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "out = generate_text_simple(\n",
 | 
			
		||||
    "token_ids = generate_text_simple(\n",
 | 
			
		||||
    "    model=model,\n",
 | 
			
		||||
    "    idx=encoded_tensor,\n",
 | 
			
		||||
    "    idx=text_to_token_ids(text_1, tokenizer),\n",
 | 
			
		||||
    "    max_new_tokens=15,\n",
 | 
			
		||||
    "    context_size=BASE_CONFIG[\"context_length\"]\n",
 | 
			
		||||
    ")\n",
 | 
			
		||||
    "decoded_text = tokenizer.decode(out.squeeze(0).tolist())\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "print(\"Output text:\", decoded_text)"
 | 
			
		||||
    "print(token_ids_to_text(token_ids, tokenizer))"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "69162550-6a02-4ece-8db1-06c71d61946f",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "- Before we finetune the model as a classifier, let's see if the model can perhaps already classify spam messages via prompting"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 19,
 | 
			
		||||
   "id": "94224aa9-c95a-4f8a-a420-76d01e3a800c",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [
 | 
			
		||||
    {
 | 
			
		||||
     "name": "stdout",
 | 
			
		||||
     "output_type": "stream",
 | 
			
		||||
     "text": [
 | 
			
		||||
      "Is the following text 'spam'? Answer with 'yes' or 'no': 'You are a winner you have been specially selected to receive $1000 cash or a $2000 award.' Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes'\n"
 | 
			
		||||
     ]
 | 
			
		||||
    }
 | 
			
		||||
   ],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "text_2 = (\n",
 | 
			
		||||
    "    \"Is the following text 'spam'? Answer with 'yes' or 'no':\"\n",
 | 
			
		||||
    "    \" 'You are a winner you have been specially\"\n",
 | 
			
		||||
    "    \" selected to receive $1000 cash or a $2000 award.'\"\n",
 | 
			
		||||
    "    \" Answer with 'yes' or 'no'.\"\n",
 | 
			
		||||
    ")\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "token_ids = generate_text_simple(\n",
 | 
			
		||||
    "    model=model,\n",
 | 
			
		||||
    "    idx=text_to_token_ids(text_2, tokenizer),\n",
 | 
			
		||||
    "    max_new_tokens=23,\n",
 | 
			
		||||
    "    context_size=BASE_CONFIG[\"context_length\"]\n",
 | 
			
		||||
    ")\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "print(token_ids_to_text(token_ids, tokenizer))"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "1ce39ed0-2c77-410d-8392-dd15d4b22016",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "- As we can see, the model is not very good at following instruction\n",
 | 
			
		||||
    "- This is expected, since it has only been pretrained and not instruction-finetuned (instruction finetuning will be covered in the next chapter)"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -865,7 +1007,15 @@
 | 
			
		||||
    "id": "4c9ae440-32f9-412f-96cf-fd52cc3e2522"
 | 
			
		||||
   },
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## 6.4 Adding a classification head"
 | 
			
		||||
    "## 6.5 Adding a classification head"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "d6e9d66f-76b2-40fc-9ec5-3f972a8db9c0",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/lm-head.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -879,7 +1029,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 19,
 | 
			
		||||
   "execution_count": 20,
 | 
			
		||||
   "id": "b23aff91-6bd0-48da-88f6-353657e6c981",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -1149,7 +1299,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 20,
 | 
			
		||||
   "execution_count": 21,
 | 
			
		||||
   "id": "fkMWFl-0etea",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "id": "fkMWFl-0etea"
 | 
			
		||||
@ -1171,7 +1321,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 21,
 | 
			
		||||
   "execution_count": 22,
 | 
			
		||||
   "id": "7e759fa0-0f69-41be-b576-17e5f20e04cb",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
@ -1192,9 +1342,17 @@
 | 
			
		||||
    "- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "0be7c1eb-c46c-4065-8525-eea1b8c66d10",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/trainable.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 22,
 | 
			
		||||
   "execution_count": 23,
 | 
			
		||||
   "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7"
 | 
			
		||||
@ -1219,7 +1377,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 23,
 | 
			
		||||
   "execution_count": 24,
 | 
			
		||||
   "id": "f645c06a-7df6-451c-ad3f-eafb18224ebc",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -1233,13 +1391,13 @@
 | 
			
		||||
     "name": "stdout",
 | 
			
		||||
     "output_type": "stream",
 | 
			
		||||
     "text": [
 | 
			
		||||
      "Inputs: tensor([[  40, 1107, 8288,  428, 3807,   13]])\n",
 | 
			
		||||
      "Inputs dimensions: torch.Size([1, 6])\n"
 | 
			
		||||
      "Inputs: tensor([[5211,  345,  423,  640]])\n",
 | 
			
		||||
      "Inputs dimensions: torch.Size([1, 4])\n"
 | 
			
		||||
     ]
 | 
			
		||||
    }
 | 
			
		||||
   ],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "inputs = tokenizer.encode(\"I really liked this movie.\")\n",
 | 
			
		||||
    "inputs = tokenizer.encode(\"Do you have time\")\n",
 | 
			
		||||
    "inputs = torch.tensor(inputs).unsqueeze(0)\n",
 | 
			
		||||
    "print(\"Inputs:\", inputs)\n",
 | 
			
		||||
    "print(\"Inputs dimensions:\", inputs.shape) # shape: (batch_size, num_tokens)"
 | 
			
		||||
@ -1255,7 +1413,7 @@
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 24,
 | 
			
		||||
   "execution_count": 25,
 | 
			
		||||
   "id": "48dc84f1-85cc-4609-9cee-94ff539f00f4",
 | 
			
		||||
   "metadata": {
 | 
			
		||||
    "colab": {
 | 
			
		||||
@ -1270,13 +1428,11 @@
 | 
			
		||||
     "output_type": "stream",
 | 
			
		||||
     "text": [
 | 
			
		||||
      "Outputs:\n",
 | 
			
		||||
      " tensor([[[-1.9044,  1.5321],\n",
 | 
			
		||||
      "         [-4.9851,  8.5136],\n",
 | 
			
		||||
      "         [-1.6985,  4.6314],\n",
 | 
			
		||||
      "         [-2.3820,  5.7547],\n",
 | 
			
		||||
      "         [-3.8736,  4.4867],\n",
 | 
			
		||||
      "         [-5.7543,  5.3615]]])\n",
 | 
			
		||||
      "Outputs dimensions: torch.Size([1, 6, 2])\n"
 | 
			
		||||
      " tensor([[[-1.5854,  0.9904],\n",
 | 
			
		||||
      "         [-3.7235,  7.4548],\n",
 | 
			
		||||
      "         [-2.2661,  6.6049],\n",
 | 
			
		||||
      "         [-3.5983,  3.9902]]])\n",
 | 
			
		||||
      "Outputs dimensions: torch.Size([1, 4, 2])\n"
 | 
			
		||||
     ]
 | 
			
		||||
    }
 | 
			
		||||
   ],
 | 
			
		||||
@ -1288,6 +1444,14 @@
 | 
			
		||||
    "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/input-and-output.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a",
 | 
			
		||||
@ -1325,12 +1489,28 @@
 | 
			
		||||
    "print(\"Last output token:\", outputs[:, -1, :])"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "8df08ae0-e664-4670-b7c5-8a2280d9b41b",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/attention-mask.webp\" width=200px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "32aa4aef-e1e9-491b-9adf-5aa973e59b8c",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## 6.5 Calculating the classification loss and accuracy"
 | 
			
		||||
    "## 6.6 Calculating the classification loss and accuracy"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "669e1fd1-ace8-44b4-b438-185ed0ba8b33",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-3.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -1545,7 +1725,7 @@
 | 
			
		||||
    "id": "456ae0fd-6261-42b4-ab6a-d24289953083"
 | 
			
		||||
   },
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## 6.6 Finetuning the model on supervised data"
 | 
			
		||||
    "## 6.7 Finetuning the model on supervised data"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -1560,6 +1740,14 @@
 | 
			
		||||
    "  2. calculate the accuracy after each epoch instead of printing a sample text after each epoch"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "979b6222-1dc2-4530-9d01-b6b04fe3de12",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/training-loop.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": 31,
 | 
			
		||||
@ -1868,7 +2056,15 @@
 | 
			
		||||
   "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## 6.7 Using the LLM as a SPAM classifier"
 | 
			
		||||
    "## 6.8 Using the LLM as a SPAM classifier"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "id": "72ebcfa2-479e-408b-9cf0-7421f6144855",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-4.webp\" width=500px>"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
@ -2069,7 +2265,7 @@
 | 
			
		||||
   "name": "python",
 | 
			
		||||
   "nbconvert_exporter": "python",
 | 
			
		||||
   "pygments_lexer": "ipython3",
 | 
			
		||||
   "version": "3.10.6"
 | 
			
		||||
   "version": "3.11.4"
 | 
			
		||||
  }
 | 
			
		||||
 },
 | 
			
		||||
 "nbformat": 4,
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user