Add figures for ch06 (#141)

This commit is contained in:
Sebastian Raschka 2024-05-05 07:10:04 -05:00 committed by GitHub
parent b8324061d0
commit d3201f5aad
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -25,7 +25,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 1,
"id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90", "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90",
"metadata": { "metadata": {
"colab": { "colab": {
@ -62,6 +62,14 @@
" print(f\"{p} version: {version(p)}\")" " print(f\"{p} version: {version(p)}\")"
] ]
}, },
{
"cell_type": "markdown",
"id": "a445828a-ff10-4efa-9f60-a2e2aed4c87d",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/chapter-overview.webp\" width=500px>"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "3a84cf35-b37f-4c15-8972-dfafc9fadc1c", "id": "3a84cf35-b37f-4c15-8972-dfafc9fadc1c",
@ -82,6 +90,42 @@
"- No code in this section" "- No code in this section"
] ]
}, },
{
"cell_type": "markdown",
"id": "ac45579d-d485-47dc-829e-43be7f4db57b",
"metadata": {},
"source": [
"- The most common ways to finetune language models are instruction-finetuning and classifcation finetuning\n",
"- Instruction-finetuning, depicted below, is the topic of the next chapter"
]
},
{
"cell_type": "markdown",
"id": "6c29ef42-46d9-43d4-8bb4-94974e1665e4",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/instructions.webp\" width=500px>"
]
},
{
"cell_type": "markdown",
"id": "a7f60321-95b8-46a9-97bf-1d07fda2c3dd",
"metadata": {},
"source": [
"- Classification finetuning, the topic of this chapter, is a procedure you may already be familiar with if you have a background in machine learning -- it's similar to training a convolutional network to classify handwritten digits, for example\n",
"- In classification finetuning, we have a specific number of class labels (for example, \"spam\" and \"not spam\") that the model can output\n",
"- A classification finetuned model can only predict classes it has seen during training (for example, \"spam\" or \"not spam\", whereas an instruction-finetuned model can usually perform many tasks\n",
"- We can think of a classification-finetuned model as a very specialized model; in practice, it is much easier to create a specialized model than a generalist model that performs well on many different tasks"
]
},
{
"cell_type": "markdown",
"id": "0b37a0c4-0bb1-4061-b1fe-eaa4416d52c3",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/spam-non-spam.webp\" width=500px>"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf", "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf",
@ -92,6 +136,14 @@
"## 6.2 Preparing the dataset" "## 6.2 Preparing the dataset"
] ]
}, },
{
"cell_type": "markdown",
"id": "5f628975-d2e8-4f7f-ab38-92bb868b7067",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-1.webp\" width=500px>"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "9fbd459f-63fa-4d8c-8499-e23103156c7d", "id": "9fbd459f-63fa-4d8c-8499-e23103156c7d",
@ -106,7 +158,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 2,
"id": "def7c09b-af9c-4216-90ce-5e67aed1065c", "id": "def7c09b-af9c-4216-90ce-5e67aed1065c",
"metadata": { "metadata": {
"colab": { "colab": {
@ -169,7 +221,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 3,
"id": "da0ed4da-ac31-4e4d-8bdd-2153be4656a4", "id": "da0ed4da-ac31-4e4d-8bdd-2153be4656a4",
"metadata": { "metadata": {
"colab": { "colab": {
@ -283,7 +335,7 @@
"[5572 rows x 2 columns]" "[5572 rows x 2 columns]"
] ]
}, },
"execution_count": 4, "execution_count": 3,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -307,7 +359,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": 4,
"id": "495a5280-9d7c-41d4-9719-64ab99056d4c", "id": "495a5280-9d7c-41d4-9719-64ab99056d4c",
"metadata": { "metadata": {
"colab": { "colab": {
@ -345,7 +397,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 6, "execution_count": 5,
"id": "7be4a0a2-9704-4a96-b38f-240339818688", "id": "7be4a0a2-9704-4a96-b38f-240339818688",
"metadata": { "metadata": {
"colab": { "colab": {
@ -396,7 +448,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 6,
"id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd", "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd",
"metadata": { "metadata": {
"id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd" "id": "c1b10c3d-5d57-42d0-8de8-cf80a06f5ffd"
@ -418,7 +470,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 7,
"id": "uQl0Psdmx15D", "id": "uQl0Psdmx15D",
"metadata": { "metadata": {
"id": "uQl0Psdmx15D" "id": "uQl0Psdmx15D"
@ -448,6 +500,14 @@
"test_df.to_csv(\"test.csv\", index=None)" "test_df.to_csv(\"test.csv\", index=None)"
] ]
}, },
{
"cell_type": "markdown",
"id": "a8d7a0c5-1d5f-458a-b685-3f49520b0094",
"metadata": {},
"source": [
"## 6.3 Creating data loaders"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "7126108a-75e7-4862-b0fb-cbf59a18bb6c", "id": "7126108a-75e7-4862-b0fb-cbf59a18bb6c",
@ -465,7 +525,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 8,
"id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7", "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7",
"metadata": { "metadata": {
"colab": { "colab": {
@ -490,6 +550,27 @@
"print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))" "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))"
] ]
}, },
{
"cell_type": "code",
"execution_count": 9,
"id": "0ff0f6b2-376b-4740-8858-55b60784be73",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[42, 13, 314, 481, 1908, 340, 757]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tokenizer.encode(\"K. I will sent it again\")"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "04f582ff-68bf-450e-bd87-5fb61afe431c", "id": "04f582ff-68bf-450e-bd87-5fb61afe431c",
@ -500,6 +581,14 @@
"- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length" "- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length"
] ]
}, },
{
"cell_type": "markdown",
"id": "0829f33f-1428-4f22-9886-7fee633b3666",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp\" width=500px>"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 10, "execution_count": 10,
@ -611,6 +700,14 @@
"- Next, we use the dataset to instantiate the data loaders, which is similar to creating the data loaders in previous chapters:" "- Next, we use the dataset to instantiate the data loaders, which is similar to creating the data loaders in previous chapters:"
] ]
}, },
{
"cell_type": "markdown",
"id": "64bcc349-205f-48f8-9655-95ff21f5e72f",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/batch.webp\" width=500px>"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": 13,
@ -730,7 +827,7 @@
"id": "d1c4f61a-5f5d-4b3b-97cf-151b617d1d6c" "id": "d1c4f61a-5f5d-4b3b-97cf-151b617d1d6c"
}, },
"source": [ "source": [
"## 6.3 Initializing a model with pretrained weights" "## 6.4 Initializing a model with pretrained weights"
] ]
}, },
{ {
@ -738,7 +835,9 @@
"id": "97e1af8b-8bd1-4b44-8b8b-dc031496e208", "id": "97e1af8b-8bd1-4b44-8b8b-dc031496e208",
"metadata": {}, "metadata": {},
"source": [ "source": [
"- In this section, we initialize the pretrained model we worked with in the previous chapter" "- In this section, we initialize the pretrained model we worked with in the previous chapter\n",
"\n",
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-2.webp\" width=500px>"
] ]
}, },
{ {
@ -819,43 +918,86 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 18, "execution_count": 18,
"id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c", "id": "d8ac25ff-74b1-4149-8dc5-4c429d464330",
"metadata": { "metadata": {},
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fe4af171-5dce-4f6e-9b63-1e4e16e8b94c",
"outputId": "8ff3ec54-1dc3-4930-9be6-8eeaf560f8d4"
},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdout",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"Output text: Every effort moves you forward.\n", "Every effort moves you forward.\n",
"\n", "\n",
"The first step is to understand the importance of your work\n" "The first step is to understand the importance of your work\n"
] ]
} }
], ],
"source": [ "source": [
"from previous_chapters import generate_text_simple\n", "from previous_chapters import (\n",
" generate_text_simple,\n",
" text_to_token_ids,\n",
" token_ids_to_text\n",
")\n",
"\n", "\n",
"start_context = \"Every effort moves you\"\n",
"\n", "\n",
"tokenizer = tiktoken.get_encoding(\"gpt2\")\n", "text_1 = \"Every effort moves you\"\n",
"encoded = tokenizer.encode(start_context)\n",
"encoded_tensor = torch.tensor(encoded).unsqueeze(0)\n",
"\n", "\n",
"out = generate_text_simple(\n", "token_ids = generate_text_simple(\n",
" model=model,\n", " model=model,\n",
" idx=encoded_tensor,\n", " idx=text_to_token_ids(text_1, tokenizer),\n",
" max_new_tokens=15,\n", " max_new_tokens=15,\n",
" context_size=BASE_CONFIG[\"context_length\"]\n", " context_size=BASE_CONFIG[\"context_length\"]\n",
")\n", ")\n",
"decoded_text = tokenizer.decode(out.squeeze(0).tolist())\n",
"\n", "\n",
"print(\"Output text:\", decoded_text)" "print(token_ids_to_text(token_ids, tokenizer))"
]
},
{
"cell_type": "markdown",
"id": "69162550-6a02-4ece-8db1-06c71d61946f",
"metadata": {},
"source": [
"- Before we finetune the model as a classifier, let's see if the model can perhaps already classify spam messages via prompting"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "94224aa9-c95a-4f8a-a420-76d01e3a800c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Is the following text 'spam'? Answer with 'yes' or 'no': 'You are a winner you have been specially selected to receive $1000 cash or a $2000 award.' Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes' or 'no'. Answer with 'yes'\n"
]
}
],
"source": [
"text_2 = (\n",
" \"Is the following text 'spam'? Answer with 'yes' or 'no':\"\n",
" \" 'You are a winner you have been specially\"\n",
" \" selected to receive $1000 cash or a $2000 award.'\"\n",
" \" Answer with 'yes' or 'no'.\"\n",
")\n",
"\n",
"token_ids = generate_text_simple(\n",
" model=model,\n",
" idx=text_to_token_ids(text_2, tokenizer),\n",
" max_new_tokens=23,\n",
" context_size=BASE_CONFIG[\"context_length\"]\n",
")\n",
"\n",
"print(token_ids_to_text(token_ids, tokenizer))"
]
},
{
"cell_type": "markdown",
"id": "1ce39ed0-2c77-410d-8392-dd15d4b22016",
"metadata": {},
"source": [
"- As we can see, the model is not very good at following instruction\n",
"- This is expected, since it has only been pretrained and not instruction-finetuned (instruction finetuning will be covered in the next chapter)"
] ]
}, },
{ {
@ -865,7 +1007,15 @@
"id": "4c9ae440-32f9-412f-96cf-fd52cc3e2522" "id": "4c9ae440-32f9-412f-96cf-fd52cc3e2522"
}, },
"source": [ "source": [
"## 6.4 Adding a classification head" "## 6.5 Adding a classification head"
]
},
{
"cell_type": "markdown",
"id": "d6e9d66f-76b2-40fc-9ec5-3f972a8db9c0",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/lm-head.webp\" width=500px>"
] ]
}, },
{ {
@ -879,7 +1029,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 19, "execution_count": 20,
"id": "b23aff91-6bd0-48da-88f6-353657e6c981", "id": "b23aff91-6bd0-48da-88f6-353657e6c981",
"metadata": { "metadata": {
"colab": { "colab": {
@ -1149,7 +1299,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 20, "execution_count": 21,
"id": "fkMWFl-0etea", "id": "fkMWFl-0etea",
"metadata": { "metadata": {
"id": "fkMWFl-0etea" "id": "fkMWFl-0etea"
@ -1171,7 +1321,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 21, "execution_count": 22,
"id": "7e759fa0-0f69-41be-b576-17e5f20e04cb", "id": "7e759fa0-0f69-41be-b576-17e5f20e04cb",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
@ -1192,9 +1342,17 @@
"- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable" "- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable"
] ]
}, },
{
"cell_type": "markdown",
"id": "0be7c1eb-c46c-4065-8525-eea1b8c66d10",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/trainable.webp\" width=500px>"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 22, "execution_count": 23,
"id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7", "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7",
"metadata": { "metadata": {
"id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7" "id": "2aedc120-5ee3-48f6-92f2-ad9304ebcdc7"
@ -1219,7 +1377,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 23, "execution_count": 24,
"id": "f645c06a-7df6-451c-ad3f-eafb18224ebc", "id": "f645c06a-7df6-451c-ad3f-eafb18224ebc",
"metadata": { "metadata": {
"colab": { "colab": {
@ -1233,13 +1391,13 @@
"name": "stdout", "name": "stdout",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"Inputs: tensor([[ 40, 1107, 8288, 428, 3807, 13]])\n", "Inputs: tensor([[5211, 345, 423, 640]])\n",
"Inputs dimensions: torch.Size([1, 6])\n" "Inputs dimensions: torch.Size([1, 4])\n"
] ]
} }
], ],
"source": [ "source": [
"inputs = tokenizer.encode(\"I really liked this movie.\")\n", "inputs = tokenizer.encode(\"Do you have time\")\n",
"inputs = torch.tensor(inputs).unsqueeze(0)\n", "inputs = torch.tensor(inputs).unsqueeze(0)\n",
"print(\"Inputs:\", inputs)\n", "print(\"Inputs:\", inputs)\n",
"print(\"Inputs dimensions:\", inputs.shape) # shape: (batch_size, num_tokens)" "print(\"Inputs dimensions:\", inputs.shape) # shape: (batch_size, num_tokens)"
@ -1255,7 +1413,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 24, "execution_count": 25,
"id": "48dc84f1-85cc-4609-9cee-94ff539f00f4", "id": "48dc84f1-85cc-4609-9cee-94ff539f00f4",
"metadata": { "metadata": {
"colab": { "colab": {
@ -1270,13 +1428,11 @@
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"Outputs:\n", "Outputs:\n",
" tensor([[[-1.9044, 1.5321],\n", " tensor([[[-1.5854, 0.9904],\n",
" [-4.9851, 8.5136],\n", " [-3.7235, 7.4548],\n",
" [-1.6985, 4.6314],\n", " [-2.2661, 6.6049],\n",
" [-2.3820, 5.7547],\n", " [-3.5983, 3.9902]]])\n",
" [-3.8736, 4.4867],\n", "Outputs dimensions: torch.Size([1, 4, 2])\n"
" [-5.7543, 5.3615]]])\n",
"Outputs dimensions: torch.Size([1, 6, 2])\n"
] ]
} }
], ],
@ -1288,6 +1444,14 @@
"print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)" "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)"
] ]
}, },
{
"cell_type": "markdown",
"id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/input-and-output.webp\" width=500px>"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "e3bb8616-c791-4f5c-bac0-5302f663e46a", "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a",
@ -1325,12 +1489,28 @@
"print(\"Last output token:\", outputs[:, -1, :])" "print(\"Last output token:\", outputs[:, -1, :])"
] ]
}, },
{
"cell_type": "markdown",
"id": "8df08ae0-e664-4670-b7c5-8a2280d9b41b",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/attention-mask.webp\" width=200px>"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "32aa4aef-e1e9-491b-9adf-5aa973e59b8c", "id": "32aa4aef-e1e9-491b-9adf-5aa973e59b8c",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 6.5 Calculating the classification loss and accuracy" "## 6.6 Calculating the classification loss and accuracy"
]
},
{
"cell_type": "markdown",
"id": "669e1fd1-ace8-44b4-b438-185ed0ba8b33",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-3.webp\" width=500px>"
] ]
}, },
{ {
@ -1545,7 +1725,7 @@
"id": "456ae0fd-6261-42b4-ab6a-d24289953083" "id": "456ae0fd-6261-42b4-ab6a-d24289953083"
}, },
"source": [ "source": [
"## 6.6 Finetuning the model on supervised data" "## 6.7 Finetuning the model on supervised data"
] ]
}, },
{ {
@ -1560,6 +1740,14 @@
" 2. calculate the accuracy after each epoch instead of printing a sample text after each epoch" " 2. calculate the accuracy after each epoch instead of printing a sample text after each epoch"
] ]
}, },
{
"cell_type": "markdown",
"id": "979b6222-1dc2-4530-9d01-b6b04fe3de12",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/training-loop.webp\" width=500px>"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 31, "execution_count": 31,
@ -1868,7 +2056,15 @@
"id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0", "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 6.7 Using the LLM as a SPAM classifier" "## 6.8 Using the LLM as a SPAM classifier"
]
},
{
"cell_type": "markdown",
"id": "72ebcfa2-479e-408b-9cf0-7421f6144855",
"metadata": {},
"source": [
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/overview-4.webp\" width=500px>"
] ]
}, },
{ {
@ -2069,7 +2265,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.6" "version": "3.11.4"
} }
}, },
"nbformat": 4, "nbformat": 4,