mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-10-31 09:50:23 +00:00 
			
		
		
		
	Add figures for ch06 (#141)
This commit is contained in:
		
							parent
							
								
									f917fc76fe
								
							
						
					
					
						commit
						c6fcadb087
					
				| @ -25,7 +25,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 2, |    "execution_count": 1, | ||||||
|    "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90", |    "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -62,6 +62,14 @@ | |||||||
|     "    print(f\"{p} version: {version(p)}\")" |     "    print(f\"{p} version: {version(p)}\")" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "a445828a-ff10-4efa-9f60-a2e2aed4c87d", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/chapter-overview.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "3a84cf35-b37f-4c15-8972-dfafc9fadc1c", |    "id": "3a84cf35-b37f-4c15-8972-dfafc9fadc1c", | ||||||
| @ -82,6 +90,42 @@ | |||||||
|     "- No code in this section" |     "- No code in this section" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "ac45579d-d485-47dc-829e-43be7f4db57b", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "- The most common ways to finetune language models are instruction-finetuning and classifcation finetuning\n", | ||||||
|  |     "- Instruction-finetuning, depicted below, is the topic of the next chapter" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "6c29ef42-46d9-43d4-8bb4-94974e1665e4", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/instructions.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "a7f60321-95b8-46a9-97bf-1d07fda2c3dd", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "- Classification finetuning, the topic of this chapter, is a procedure you may already be familiar with if you have a background in machine learning -- it's similar to training a convolutional network to classify handwritten digits, for example\n", | ||||||
|  |     "- In classification finetuning, we have a specific number of class labels (for example, \"spam\" and \"not spam\") that the model can output\n", | ||||||
|  |     "- A classification finetuned model can only predict classes it has seen during training (for example, \"spam\" or \"not spam\", whereas an instruction-finetuned model can usually perform many tasks\n", | ||||||
|  |     "- We can think of a classification-finetuned model as a very specialized model; in practice, it is much easier to create a specialized model than a generalist model that performs well on many different tasks" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "0b37a0c4-0bb1-4061-b1fe-eaa4416d52c3", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/spam-non-spam.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf", |    "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf", | ||||||
| @ -92,6 +136,14 @@ | |||||||
|     "## 6.2 Preparing the dataset" |     "## 6.2 Preparing the dataset" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "5f628975-d2e8-4f7f-ab38-92bb868b7067", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-1.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "9fbd459f-63fa-4d8c-8499-e23103156c7d", |    "id": "9fbd459f-63fa-4d8c-8499-e23103156c7d", | ||||||
| @ -106,7 +158,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 3, |    "execution_count": 2, | ||||||
|    "id": "def7c09b-af9c-4216-90ce-5e67aed1065c", |    "id": "def7c09b-af9c-4216-90ce-5e67aed1065c", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -169,7 +221,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 4, |    "execution_count": 3, | ||||||
|    "id": "da0ed4da-ac31-4e4d-8bdd-2153be4656a4", |    "id": "da0ed4da-ac31-4e4d-8bdd-2153be4656a4", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -283,7 +335,7 @@ | |||||||
|        "[5572 rows x 2 columns]" |        "[5572 rows x 2 columns]" | ||||||
|       ] |       ] | ||||||
|      }, |      }, | ||||||
|      "execution_count": 4, |      "execution_count": 3, | ||||||
|      "metadata": {}, |      "metadata": {}, | ||||||
|      "output_type": "execute_result" |      "output_type": "execute_result" | ||||||
|     } |     } | ||||||
| @ -307,7 +359,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 5, |    "execution_count": 4, | ||||||
|    "id": "495a5280-9d7c-41d4-9719-64ab99056d4c", |    "id": "495a5280-9d7c-41d4-9719-64ab99056d4c", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -345,7 +397,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 6, |    "execution_count": 5, | ||||||
|    "id": "7be4a0a2-9704-4a96-b38f-240339818688", |    "id": "7be4a0a2-9704-4a96-b38f-240339818688", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -396,7 +448,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 7, |    "execution_count": 6, | ||||||
|    "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd", |    "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd" |     "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd" | ||||||
| @ -418,7 +470,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 8, |    "execution_count": 7, | ||||||
|    "id": "uQl0Psdmx15D", |    "id": "uQl0Psdmx15D", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "id": "uQl0Psdmx15D" |     "id": "uQl0Psdmx15D" | ||||||
| @ -448,6 +500,14 @@ | |||||||
|     "test_df.to_csv(\"test.csv\", index=None)" |     "test_df.to_csv(\"test.csv\", index=None)" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "a8d7a0c5-1d5f-458a-b685-3f49520b0094", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "## 6.3 Creating data loaders" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "7126108a-75e7-4862-b0fb-cbf59a18bb6c", |    "id": "7126108a-75e7-4862-b0fb-cbf59a18bb6c", | ||||||
| @ -465,7 +525,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 9, |    "execution_count": 8, | ||||||
|    "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7", |    "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -490,6 +550,27 @@ | |||||||
|     "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))" |     "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "code", | ||||||
|  |    "execution_count": 9, | ||||||
|  |    "id": "0ff0f6b2-376b-4740-8858-55b60784be73", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "outputs": [ | ||||||
|  |     { | ||||||
|  |      "data": { | ||||||
|  |       "text/plain": [ | ||||||
|  |        "[42, 13, 314, 481, 1908, 340, 757]" | ||||||
|  |       ] | ||||||
|  |      }, | ||||||
|  |      "execution_count": 9, | ||||||
|  |      "metadata": {}, | ||||||
|  |      "output_type": "execute_result" | ||||||
|  |     } | ||||||
|  |    ], | ||||||
|  |    "source": [ | ||||||
|  |     "tokenizer.encode(\"K. I will sent it again\")" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "04f582ff-68bf-450e-bd87-5fb61afe431c", |    "id": "04f582ff-68bf-450e-bd87-5fb61afe431c", | ||||||
| @ -500,6 +581,14 @@ | |||||||
|     "- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length" |     "- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "0829f33f-1428-4f22-9886-7fee633b3666", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 10, |    "execution_count": 10, | ||||||
| @ -611,6 +700,14 @@ | |||||||
|     "- Next, we use the dataset to instantiate the data loaders, which is similar to creating the data loaders in previous chapters:" |     "- Next, we use the dataset to instantiate the data loaders, which is similar to creating the data loaders in previous chapters:" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "64bcc349-205f-48f8-9655-95ff21f5e72f", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/batch.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 13, |    "execution_count": 13, | ||||||
| @ -730,7 +827,7 @@ | |||||||
|     "id": "d1c4f61a-5f5d-4b3b-97cf-151b617d1d6c" |     "id": "d1c4f61a-5f5d-4b3b-97cf-151b617d1d6c" | ||||||
|    }, |    }, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "## 6.3 Initializing a model with pretrained weights" |     "## 6.4 Initializing a model with pretrained weights" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -738,7 +835,9 @@ | |||||||
|    "id": "97e1af8b-8bd1-4b44-8b8b-dc031496e208", |    "id": "97e1af8b-8bd1-4b44-8b8b-dc031496e208", | ||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "- In this section, we initialize the pretrained model we worked with in the previous chapter" |     "- In this section, we initialize the pretrained model we worked with in the previous chapter\n", | ||||||
|  |     "\n", | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-2.webp\" width=500px>" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -819,43 +918,86 @@ | |||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 18, |    "execution_count": 18, | ||||||
|    "id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c", |    "id": "d8ac25ff-74b1-4149-8dc5-4c429d464330", | ||||||
|    "metadata": { |    "metadata": {}, | ||||||
|     "colab": { |  | ||||||
|      "base_uri": "https://localhost:8080/" |  | ||||||
|     }, |  | ||||||
|     "id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c", |  | ||||||
|     "outputId": "8ff3ec54-1dc3-4930-9be6-8eeaf560f8d4" |  | ||||||
|    }, |  | ||||||
|    "outputs": [ |    "outputs": [ | ||||||
|     { |     { | ||||||
|      "name": "stdout", |      "name": "stdout", | ||||||
|      "output_type": "stream", |      "output_type": "stream", | ||||||
|      "text": [ |      "text": [ | ||||||
|       "Output text: Every effort moves you forward.\n", |       "Every effort moves you forward.\n", | ||||||
|       "\n", |       "\n", | ||||||
|       "The first step is to understand the importance of your work\n" |       "The first step is to understand the importance of your work\n" | ||||||
|      ] |      ] | ||||||
|     } |     } | ||||||
|    ], |    ], | ||||||
|    "source": [ |    "source": [ | ||||||
|     "from previous_chapters import generate_text_simple\n", |     "from previous_chapters import (\n", | ||||||
|  |     "    generate_text_simple,\n", | ||||||
|  |     "    text_to_token_ids,\n", | ||||||
|  |     "    token_ids_to_text\n", | ||||||
|  |     ")\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "start_context = \"Every effort moves you\"\n", |  | ||||||
|     "\n", |     "\n", | ||||||
|     "tokenizer = tiktoken.get_encoding(\"gpt2\")\n", |     "text_1 = \"Every effort moves you\"\n", | ||||||
|     "encoded = tokenizer.encode(start_context)\n", |  | ||||||
|     "encoded_tensor = torch.tensor(encoded).unsqueeze(0)\n", |  | ||||||
|     "\n", |     "\n", | ||||||
|     "out = generate_text_simple(\n", |     "token_ids = generate_text_simple(\n", | ||||||
|     "    model=model,\n", |     "    model=model,\n", | ||||||
|     "    idx=encoded_tensor,\n", |     "    idx=text_to_token_ids(text_1, tokenizer),\n", | ||||||
|     "    max_new_tokens=15,\n", |     "    max_new_tokens=15,\n", | ||||||
|     "    context_size=BASE_CONFIG[\"context_length\"]\n", |     "    context_size=BASE_CONFIG[\"context_length\"]\n", | ||||||
|     ")\n", |     ")\n", | ||||||
|     "decoded_text = tokenizer.decode(out.squeeze(0).tolist())\n", |  | ||||||
|     "\n", |     "\n", | ||||||
|     "print(\"Output text:\", decoded_text)" |     "print(token_ids_to_text(token_ids, tokenizer))" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "69162550-6a02-4ece-8db1-06c71d61946f", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "- Before we finetune the model as a classifier, let's see if the model can perhaps already classify spam messages via prompting" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "code", | ||||||
|  |    "execution_count": 19, | ||||||
|  |    "id": "94224aa9-c95a-4f8a-a420-76d01e3a800c", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "outputs": [ | ||||||
|  |     { | ||||||
|  |      "name": "stdout", | ||||||
|  |      "output_type": "stream", | ||||||
|  |      "text": [ | ||||||
|  |       "Is the following text 'spam'? Answer with 'yes' or 'no': 'You are a winner you have been specially selected to receive $1000 cash or a $2000 award.' Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes'\n" | ||||||
|  |      ] | ||||||
|  |     } | ||||||
|  |    ], | ||||||
|  |    "source": [ | ||||||
|  |     "text_2 = (\n", | ||||||
|  |     "    \"Is the following text 'spam'? Answer with 'yes' or 'no':\"\n", | ||||||
|  |     "    \" 'You are a winner you have been specially\"\n", | ||||||
|  |     "    \" selected to receive $1000 cash or a $2000 award.'\"\n", | ||||||
|  |     "    \" Answer with 'yes' or 'no'.\"\n", | ||||||
|  |     ")\n", | ||||||
|  |     "\n", | ||||||
|  |     "token_ids = generate_text_simple(\n", | ||||||
|  |     "    model=model,\n", | ||||||
|  |     "    idx=text_to_token_ids(text_2, tokenizer),\n", | ||||||
|  |     "    max_new_tokens=23,\n", | ||||||
|  |     "    context_size=BASE_CONFIG[\"context_length\"]\n", | ||||||
|  |     ")\n", | ||||||
|  |     "\n", | ||||||
|  |     "print(token_ids_to_text(token_ids, tokenizer))" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "1ce39ed0-2c77-410d-8392-dd15d4b22016", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "- As we can see, the model is not very good at following instruction\n", | ||||||
|  |     "- This is expected, since it has only been pretrained and not instruction-finetuned (instruction finetuning will be covered in the next chapter)" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -865,7 +1007,15 @@ | |||||||
|     "id": "4c9ae440-32f9-412f-96cf-fd52cc3e2522" |     "id": "4c9ae440-32f9-412f-96cf-fd52cc3e2522" | ||||||
|    }, |    }, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "## 6.4 Adding a classification head" |     "## 6.5 Adding a classification head" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "d6e9d66f-76b2-40fc-9ec5-3f972a8db9c0", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/lm-head.webp\" width=500px>" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -879,7 +1029,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 19, |    "execution_count": 20, | ||||||
|    "id": "b23aff91-6bd0-48da-88f6-353657e6c981", |    "id": "b23aff91-6bd0-48da-88f6-353657e6c981", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -1149,7 +1299,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 20, |    "execution_count": 21, | ||||||
|    "id": "fkMWFl-0etea", |    "id": "fkMWFl-0etea", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "id": "fkMWFl-0etea" |     "id": "fkMWFl-0etea" | ||||||
| @ -1171,7 +1321,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 21, |    "execution_count": 22, | ||||||
|    "id": "7e759fa0-0f69-41be-b576-17e5f20e04cb", |    "id": "7e759fa0-0f69-41be-b576-17e5f20e04cb", | ||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "outputs": [], |    "outputs": [], | ||||||
| @ -1192,9 +1342,17 @@ | |||||||
|     "- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable" |     "- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "0be7c1eb-c46c-4065-8525-eea1b8c66d10", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/trainable.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 22, |    "execution_count": 23, | ||||||
|    "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7", |    "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7" |     "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7" | ||||||
| @ -1219,7 +1377,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 23, |    "execution_count": 24, | ||||||
|    "id": "f645c06a-7df6-451c-ad3f-eafb18224ebc", |    "id": "f645c06a-7df6-451c-ad3f-eafb18224ebc", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -1233,13 +1391,13 @@ | |||||||
|      "name": "stdout", |      "name": "stdout", | ||||||
|      "output_type": "stream", |      "output_type": "stream", | ||||||
|      "text": [ |      "text": [ | ||||||
|       "Inputs: tensor([[  40, 1107, 8288,  428, 3807,   13]])\n", |       "Inputs: tensor([[5211,  345,  423,  640]])\n", | ||||||
|       "Inputs dimensions: torch.Size([1, 6])\n" |       "Inputs dimensions: torch.Size([1, 4])\n" | ||||||
|      ] |      ] | ||||||
|     } |     } | ||||||
|    ], |    ], | ||||||
|    "source": [ |    "source": [ | ||||||
|     "inputs = tokenizer.encode(\"I really liked this movie.\")\n", |     "inputs = tokenizer.encode(\"Do you have time\")\n", | ||||||
|     "inputs = torch.tensor(inputs).unsqueeze(0)\n", |     "inputs = torch.tensor(inputs).unsqueeze(0)\n", | ||||||
|     "print(\"Inputs:\", inputs)\n", |     "print(\"Inputs:\", inputs)\n", | ||||||
|     "print(\"Inputs dimensions:\", inputs.shape) # shape: (batch_size, num_tokens)" |     "print(\"Inputs dimensions:\", inputs.shape) # shape: (batch_size, num_tokens)" | ||||||
| @ -1255,7 +1413,7 @@ | |||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 24, |    "execution_count": 25, | ||||||
|    "id": "48dc84f1-85cc-4609-9cee-94ff539f00f4", |    "id": "48dc84f1-85cc-4609-9cee-94ff539f00f4", | ||||||
|    "metadata": { |    "metadata": { | ||||||
|     "colab": { |     "colab": { | ||||||
| @ -1270,13 +1428,11 @@ | |||||||
|      "output_type": "stream", |      "output_type": "stream", | ||||||
|      "text": [ |      "text": [ | ||||||
|       "Outputs:\n", |       "Outputs:\n", | ||||||
|       " tensor([[[-1.9044,  1.5321],\n", |       " tensor([[[-1.5854,  0.9904],\n", | ||||||
|       "         [-4.9851,  8.5136],\n", |       "         [-3.7235,  7.4548],\n", | ||||||
|       "         [-1.6985,  4.6314],\n", |       "         [-2.2661,  6.6049],\n", | ||||||
|       "         [-2.3820,  5.7547],\n", |       "         [-3.5983,  3.9902]]])\n", | ||||||
|       "         [-3.8736,  4.4867],\n", |       "Outputs dimensions: torch.Size([1, 4, 2])\n" | ||||||
|       "         [-5.7543,  5.3615]]])\n", |  | ||||||
|       "Outputs dimensions: torch.Size([1, 6, 2])\n" |  | ||||||
|      ] |      ] | ||||||
|     } |     } | ||||||
|    ], |    ], | ||||||
| @ -1288,6 +1444,14 @@ | |||||||
|     "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)" |     "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/input-and-output.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a", |    "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a", | ||||||
| @ -1325,12 +1489,28 @@ | |||||||
|     "print(\"Last output token:\", outputs[:, -1, :])" |     "print(\"Last output token:\", outputs[:, -1, :])" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "8df08ae0-e664-4670-b7c5-8a2280d9b41b", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/attention-mask.webp\" width=200px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "markdown", |    "cell_type": "markdown", | ||||||
|    "id": "32aa4aef-e1e9-491b-9adf-5aa973e59b8c", |    "id": "32aa4aef-e1e9-491b-9adf-5aa973e59b8c", | ||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "## 6.5 Calculating the classification loss and accuracy" |     "## 6.6 Calculating the classification loss and accuracy" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "669e1fd1-ace8-44b4-b438-185ed0ba8b33", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-3.webp\" width=500px>" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -1545,7 +1725,7 @@ | |||||||
|     "id": "456ae0fd-6261-42b4-ab6a-d24289953083" |     "id": "456ae0fd-6261-42b4-ab6a-d24289953083" | ||||||
|    }, |    }, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "## 6.6 Finetuning the model on supervised data" |     "## 6.7 Finetuning the model on supervised data" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -1560,6 +1740,14 @@ | |||||||
|     "  2. calculate the accuracy after each epoch instead of printing a sample text after each epoch" |     "  2. calculate the accuracy after each epoch instead of printing a sample text after each epoch" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "979b6222-1dc2-4530-9d01-b6b04fe3de12", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/training-loop.webp\" width=500px>" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|   { |   { | ||||||
|    "cell_type": "code", |    "cell_type": "code", | ||||||
|    "execution_count": 31, |    "execution_count": 31, | ||||||
| @ -1868,7 +2056,15 @@ | |||||||
|    "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0", |    "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0", | ||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "## 6.7 Using the LLM as a SPAM classifier" |     "## 6.8 Using the LLM as a SPAM classifier" | ||||||
|  |    ] | ||||||
|  |   }, | ||||||
|  |   { | ||||||
|  |    "cell_type": "markdown", | ||||||
|  |    "id": "72ebcfa2-479e-408b-9cf0-7421f6144855", | ||||||
|  |    "metadata": {}, | ||||||
|  |    "source": [ | ||||||
|  |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-4.webp\" width=500px>" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -2069,7 +2265,7 @@ | |||||||
|    "name": "python", |    "name": "python", | ||||||
|    "nbconvert_exporter": "python", |    "nbconvert_exporter": "python", | ||||||
|    "pygments_lexer": "ipython3", |    "pygments_lexer": "ipython3", | ||||||
|    "version": "3.10.6" |    "version": "3.11.4" | ||||||
|   } |   } | ||||||
|  }, |  }, | ||||||
|  "nbformat": 4, |  "nbformat": 4, | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 Sebastian Raschka
						Sebastian Raschka