diff --git a/ch07/01_main-chapter-code/ch07.ipynb b/ch07/01_main-chapter-code/ch07.ipynb index 686fc7b..110d916 100644 --- a/ch07/01_main-chapter-code/ch07.ipynb +++ b/ch07/01_main-chapter-code/ch07.ipynb @@ -3,7 +3,9 @@ { "cell_type": "markdown", "id": "12e91914-5f51-43fa-b65b-625e73b4d17b", - "metadata": {}, + "metadata": { + "id": "12e91914-5f51-43fa-b65b-625e73b4d17b" + }, "source": [ "\n", "\n", @@ -23,7 +25,9 @@ { "cell_type": "markdown", "id": "c2520ec3-722f-4f44-bdd1-885b13e7afbf", - "metadata": {}, + "metadata": { + "id": "c2520ec3-722f-4f44-bdd1-885b13e7afbf" + }, "source": [ "# Chapter 7: Finetuning To Follow Instructions" ] @@ -37,17 +41,17 @@ "base_uri": "https://localhost:8080/" }, "id": "4e19327b-6c02-4881-ad02-9b6d3ec0b1b4", - "outputId": "538e79af-011b-4a60-f288-2d0312a2b5a6" + "outputId": "6560a9ce-8cbe-4c37-885b-e9c8c1946f69" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "matplotlib version: 3.8.2\n", - "tiktoken version: 0.5.1\n", - "torch version: 2.2.2\n", - "tqdm version: 4.66.1\n", + "matplotlib version: 3.7.1\n", + "tiktoken version: 0.7.0\n", + "torch version: 2.3.0+cu121\n", + "tqdm version: 4.66.4\n", "tensorflow version: 2.15.0\n" ] } @@ -66,10 +70,22 @@ " print(f\"{p} version: {version(p)}\")" ] }, + { + "cell_type": "markdown", + "id": "264fca98-2f9a-4193-b435-2abfa3b4142f", + "metadata": { + "id": "264fca98-2f9a-4193-b435-2abfa3b4142f" + }, + "source": [ + "[figure]" + ] + }, { "cell_type": "markdown", "id": "8bbc68e9-75b3-41f1-ac2c-e071c3cd0813", - "metadata": {}, + "metadata": { + "id": "8bbc68e9-75b3-41f1-ac2c-e071c3cd0813" + }, "source": [ "## 7.1 Introduction to instruction finetuning" ] @@ -77,7 +93,9 @@ { "cell_type": "markdown", "id": "53dba24a-6805-496c-9a7f-c75e2d3527ab", - "metadata": {}, + "metadata": { + "id": "53dba24a-6805-496c-9a7f-c75e2d3527ab" + }, "source": [ "- In chapter 5, we saw that pretraining an LLM involves a training procedure where it learns to generate one word at a time\n", "- Hence, a pretrained LLM is good at text completion, but it is not good at following instructions\n", @@ -87,27 +105,33 @@ { "cell_type": "markdown", "id": "18dc0535-0904-44ed-beaf-9b678292ef35", - "metadata": {}, + "metadata": { + "id": "18dc0535-0904-44ed-beaf-9b678292ef35" + }, "source": [ - "[insert figure]" + "[figure]" ] }, { "cell_type": "markdown", "id": "b4698b23-12e0-4bd7-a140-ccb3dd71d4e8", - "metadata": {}, + "metadata": { + "id": "b4698b23-12e0-4bd7-a140-ccb3dd71d4e8" + }, "source": [ "- An optional step after instruction finetuning is preference tuning, which refines the response style of an LLM; readers interested in preference tuning can find example code in the bonus materials: [../04_preference-tuning-with-dpo](../04_preference-tuning-with-dpo)\n", "\n", "- The topics covered in this chapter are summarized in the figure below\n", "\n", - "[insert figure]" + "[figure]" ] }, { "cell_type": "markdown", "id": "5384f0cf-ef3c-4436-a5fa-59bd25649f86", - "metadata": {}, + "metadata": { + "id": "5384f0cf-ef3c-4436-a5fa-59bd25649f86" + }, "source": [ "## 7.2 Preparing a dataset for supervised instruction finetuning" ] @@ -115,7 +139,9 @@ { "cell_type": "markdown", "id": "f8b34ff8-619f-4e89-bd03-ce513269760d", - "metadata": {}, + "metadata": { + "id": "f8b34ff8-619f-4e89-bd03-ce513269760d" + }, "source": [ "- We will work with an instruction dataset I prepared for this chapter" ] @@ -129,7 +155,7 @@ "base_uri": "https://localhost:8080/" }, "id": "0G3axLw6kY1N", - "outputId": "2a9a1c83-9c46-49a5-f9df-fce3320f7db2" + "outputId": "c48ade8c-0d31-4efb-8246-6e6c51669dde" }, "outputs": [ { @@ -173,7 +199,9 @@ { "cell_type": "markdown", "id": "d7af8176-4255-4e92-8c7d-998771733eb8", - "metadata": {}, + "metadata": { + "id": "d7af8176-4255-4e92-8c7d-998771733eb8" + }, "source": [ "- Each item in the `data` list we loaded from the JSON file above is a dictionary in the following form:" ] @@ -187,7 +215,7 @@ "base_uri": "https://localhost:8080/" }, "id": "-LiuBMsHkzQV", - "outputId": "fc3b22fd-9a53-405e-9c25-2a5873d343d1" + "outputId": "88fe5be1-da18-45b5-dbb5-abcbcc4558e5" }, "outputs": [ { @@ -207,7 +235,9 @@ { "cell_type": "markdown", "id": "c5a32b34-485a-4816-a77a-da14f9fe6e46", - "metadata": {}, + "metadata": { + "id": "c5a32b34-485a-4816-a77a-da14f9fe6e46" + }, "source": [ "- Note that the `'input'` field can be empty:" ] @@ -221,7 +251,7 @@ "base_uri": "https://localhost:8080/" }, "id": "uFInFxDDk2Je", - "outputId": "84cb1aad-233a-488a-f6b0-6cb977834367" + "outputId": "a07ca278-0205-4ac4-b81e-54a513ece585" }, "outputs": [ { @@ -241,7 +271,9 @@ { "cell_type": "markdown", "id": "f034799a-6575-45fd-98c9-9d1012d0fd58", - "metadata": {}, + "metadata": { + "id": "f034799a-6575-45fd-98c9-9d1012d0fd58" + }, "source": [ "- Instruction finetuning is often referred to as \"supervised instruction finetuning\" because it involves training a model on a dataset where the input-output pairs are explicitly provided\n", "- There are different ways to format the entries as inputs to the LLM; the figure below illustrates two example formats that were used for training the Alpaca (https://crfm.stanford.edu/2023/03/13/alpaca.html) and Phi-3 (https://arxiv.org/abs/2404.14219) LLMs, respectively" @@ -250,15 +282,19 @@ { "cell_type": "markdown", "id": "dffa4f70-44d4-4be4-89a9-2159f4885b10", - "metadata": {}, + "metadata": { + "id": "dffa4f70-44d4-4be4-89a9-2159f4885b10" + }, "source": [ - "[insert figure]" + "[figure]" ] }, { "cell_type": "markdown", "id": "dd79a74e-befb-491c-be49-f777a6a5b6a6", - "metadata": {}, + "metadata": { + "id": "dd79a74e-befb-491c-be49-f777a6a5b6a6" + }, "source": [ "- In this chapter, we use Alpaca-style prompt formatting, which was the original prompt template for instruction finetuning\n", "- Below we format the input that we will pass as input to the LLM" @@ -285,6 +321,16 @@ " return instruction_text + input_text" ] }, + { + "cell_type": "markdown", + "id": "011e78b4-e89a-4653-a2ee-7b2739ca04d6", + "metadata": { + "id": "011e78b4-e89a-4653-a2ee-7b2739ca04d6" + }, + "source": [ + "- A formatted response with input field looks like as shown below" + ] + }, { "cell_type": "code", "execution_count": 6, @@ -294,7 +340,7 @@ "base_uri": "https://localhost:8080/" }, "id": "F9UQRfjzo4Js", - "outputId": "b56e6c03-f603-4e9d-c1b6-b4a70403caf9" + "outputId": "f05669d2-13a8-4eb3-f549-dab83cec1e00" }, "outputs": [ { @@ -321,11 +367,27 @@ "print(model_input + desired_response)" ] }, + { + "cell_type": "markdown", + "id": "4dc93ddf-431c-49c0-96f2-fb3a79c4d94c", + "metadata": { + "id": "4dc93ddf-431c-49c0-96f2-fb3a79c4d94c" + }, + "source": [ + "- Below is a formatted response without input field" + ] + }, { "cell_type": "code", "execution_count": 7, "id": "a3891fa9-f738-41cd-946c-80ef9a99c346", - "metadata": {}, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "a3891fa9-f738-41cd-946c-80ef9a99c346", + "outputId": "b9550b1f-8b35-4b00-96d3-a1ce2b76daee" + }, "outputs": [ { "name": "stdout", @@ -351,7 +413,9 @@ { "cell_type": "markdown", "id": "4aa8afd5-2a21-49a5-90c3-6a03865a4771", - "metadata": {}, + "metadata": { + "id": "4aa8afd5-2a21-49a5-90c3-6a03865a4771" + }, "source": [ "- Lastly, before we prepare the PyTorch data loaders in the next section, we divide the dataset into a training, validation, and test set" ] @@ -383,7 +447,7 @@ "base_uri": "https://localhost:8080/" }, "id": "-zf6oht6bIUQ", - "outputId": "bf33cd9a-2778-4365-c51d-d394c817c4fb" + "outputId": "5a11a57f-2ce2-408f-e05a-a09cb661e49b" }, "outputs": [ { @@ -405,36 +469,47 @@ { "cell_type": "markdown", "id": "fcaaf606-f913-4445-8301-632ae10d387d", - "metadata": {}, + "metadata": { + "id": "fcaaf606-f913-4445-8301-632ae10d387d" + }, "source": [ "## 7.3 Creating data loaders for an instruction dataset" ] }, + { + "cell_type": "markdown", + "id": "233f63bd-9755-4d07-8884-5e2e5345cf27", + "metadata": { + "id": "233f63bd-9755-4d07-8884-5e2e5345cf27" + }, + "source": [ + "[figure]" + ] + }, + { + "cell_type": "markdown", + "id": "b9af423f-aad9-4b3c-bea5-153021c04862", + "metadata": { + "id": "b9af423f-aad9-4b3c-bea5-153021c04862" + }, + "source": [ + "- First, we implement an `InstructionDataset` class that pre-tokenizes all inputs in the dataset, similar to the `SpamDataset` in chapter 6\n", + "\n", + "[figure]" + ] + }, { "cell_type": "code", "execution_count": 10, "id": "K6MWf0lhu8GP", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "K6MWf0lhu8GP", - "outputId": "bb01c511-4023-4b74-9781-8385da75b391" + "id": "K6MWf0lhu8GP" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[50256]\n" - ] - } - ], + "outputs": [], "source": [ "import tiktoken\n", "\n", - "tokenizer = tiktoken.get_encoding(\"gpt2\")\n", - "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))" + "tokenizer = tiktoken.get_encoding(\"gpt2\")" ] }, { @@ -461,9 +536,7 @@ " response_text = f\"\\n\\n### Response:\\n{entry['output']}\"\n", " full_text = instruction_plus_input + response_text\n", " self.encoded_texts.append(\n", - " tokenizer.encode(\n", - " full_text, allowed_special={\"<|endoftext|>\"}\n", - " )\n", + " tokenizer.encode(full_text)\n", " )\n", "\n", " def __getitem__(self, index):\n", @@ -476,13 +549,37 @@ { "cell_type": "code", "execution_count": 12, + "id": "ff24fe1a-5746-461c-ad3d-b6d84a1a7c96", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ff24fe1a-5746-461c-ad3d-b6d84a1a7c96", + "outputId": "7459dd6d-aaad-49c5-9c82-db9b50358c77" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[50256]\n" + ] + } + ], + "source": [ + "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, "id": "W2jvh-OP9MFV", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "W2jvh-OP9MFV", - "outputId": "7878ef5f-635a-491a-99b2-07b3319daefc" + "outputId": "b3f94569-8997-461b-909e-b469e0b3c089" }, "outputs": [ { @@ -491,7 +588,7 @@ "tensor(1.1269)" ] }, - "execution_count": 12, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -510,10 +607,14 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "id": "nvVMuil89v9N", "metadata": { - "id": "nvVMuil89v9N" + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nvVMuil89v9N", + "outputId": "5d9f0948-ddc2-4766-c2ba-c14ca550e9d1" }, "outputs": [ { @@ -522,7 +623,7 @@ "tensor(0.7936)" ] }, - "execution_count": 13, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } @@ -539,14 +640,14 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 15, "id": "RTyB1vah9p56", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "RTyB1vah9p56", - "outputId": "f1c132ad-85db-411d-cfc8-1d9ab3aec79d" + "outputId": "245a8257-d1a3-4e94-a062-07b820b71aed" }, "outputs": [ { @@ -555,7 +656,7 @@ "tensor(1.1269)" ] }, - "execution_count": 14, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -572,24 +673,28 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "id": "41ec6e2d-9eb2-4124-913e-d2af39be4cf2", - "metadata": {}, + "metadata": { + "id": "41ec6e2d-9eb2-4124-913e-d2af39be4cf2" + }, "outputs": [], "source": [ "def custom_collate_fn(\n", - " batch, \n", + " batch,\n", " pad_token_id=50256,\n", " ignore_index=-100,\n", - " allowed_max_length=None, \n", + " allowed_max_length=None,\n", " device=\"cpu\"\n", "):\n", " # Find the longest sequence in the batch\n", - " batch_max_length = max(len(item) for item in batch)\n", + " batch_max_length = max(len(item)+1 for item in batch)\n", "\n", " # Pad and prepare inputs and targets\n", " inputs_lst, targets_lst = [], []\n", " for item in batch:\n", + " # Add an <|endoftext|> token\n", + " item += [pad_token_id]\n", " # Pad sequences to max_length\n", " padded = item + [pad_token_id] * (batch_max_length - len(item))\n", " inputs = torch.tensor(padded[:-1]) # Truncate the last token for inputs\n", @@ -617,26 +722,32 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 17, "id": "cdf5eec4-9ebe-4be0-9fca-9a47bee88fdc", - "metadata": {}, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cdf5eec4-9ebe-4be0-9fca-9a47bee88fdc", + "outputId": "0484b12b-b0d6-4329-d6d3-7a2b05fbaf8e" + }, "outputs": [ { "data": { "text/plain": [ - "(tensor([[ 0, 1, 2, 3, 4, 50256],\n", - " [ 7, 8, 9, 50256, 50256, 50256]]),\n", - " tensor([[ 1, 2, 3, 4, 50256, -100],\n", - " [ 8, 9, 50256, -100, -100, -100]]))" + "(tensor([[ 0, 1, 2, 3, 4, 5, 6],\n", + " [ 7, 8, 9, 50256, 50256, 50256, 50256]]),\n", + " tensor([[ 1, 2, 3, 4, 5, 6, 50256],\n", + " [ 8, 9, 50256, -100, -100, -100, -100]]))" ] }, - "execution_count": 16, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "inputs_1 = [0, 1, 2, 3, 4, 50256, 50256]\n", + "inputs_1 = [0, 1, 2, 3, 4, 5, 6]\n", "inputs_2 = [7, 8, 9]\n", "\n", "batch = (\n", @@ -649,17 +760,21 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 18, "id": "etpqqWh8phKc", "metadata": { - "id": "etpqqWh8phKc" + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "etpqqWh8phKc", + "outputId": "f2f902d2-d51a-4a62-a2ae-b1f52037c92f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Device: cpu\n" + "Device: cuda\n" ] } ], @@ -674,14 +789,10 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 22, "id": "BtWkgir6Hlpe", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "BtWkgir6Hlpe", - "outputId": "8e3a969d-e1f6-4574-cc07-3f8401068555" + "id": "BtWkgir6Hlpe" }, "outputs": [], "source": [ @@ -705,7 +816,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 23, "id": "1d097dc8-ad34-4f05-b435-e4147965f532", "metadata": { "id": "1d097dc8-ad34-4f05-b435-e4147965f532" @@ -733,14 +844,14 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 24, "id": "GGs1AI3vHpnX", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GGs1AI3vHpnX", - "outputId": "eaabe39c-bb78-4fec-979c-6382c192a79f" + "outputId": "df95971c-10ca-49e8-9823-d63bc5b6a3fc" }, "outputs": [ { @@ -748,122 +859,122 @@ "output_type": "stream", "text": [ "Train loader:\n", - "torch.Size([8, 60]) torch.Size([8, 60])\n", - "torch.Size([8, 75]) torch.Size([8, 75])\n", + "torch.Size([8, 61]) torch.Size([8, 61])\n", + "torch.Size([8, 76]) torch.Size([8, 76])\n", + "torch.Size([8, 73]) torch.Size([8, 73])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 65]) torch.Size([8, 65])\n", "torch.Size([8, 72]) torch.Size([8, 72])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 71]) torch.Size([8, 71])\n", - "torch.Size([8, 79]) torch.Size([8, 79])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 61]) torch.Size([8, 61])\n", - "torch.Size([8, 74]) torch.Size([8, 74])\n", - "torch.Size([8, 61]) torch.Size([8, 61])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 76]) torch.Size([8, 76])\n", - "torch.Size([8, 68]) torch.Size([8, 68])\n", - "torch.Size([8, 78]) torch.Size([8, 78])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 82]) torch.Size([8, 82])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 79]) torch.Size([8, 79])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 68]) torch.Size([8, 68])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 59]) torch.Size([8, 59])\n", - "torch.Size([8, 58]) torch.Size([8, 58])\n", - "torch.Size([8, 68]) torch.Size([8, 68])\n", - "torch.Size([8, 62]) torch.Size([8, 62])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 75]) torch.Size([8, 75])\n", - "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 90]) torch.Size([8, 90])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 74]) torch.Size([8, 74])\n", - "torch.Size([8, 88]) torch.Size([8, 88])\n", - "torch.Size([8, 58]) torch.Size([8, 58])\n", - "torch.Size([8, 87]) torch.Size([8, 87])\n", - "torch.Size([8, 82]) torch.Size([8, 82])\n", - "torch.Size([8, 82]) torch.Size([8, 82])\n", - "torch.Size([8, 69]) torch.Size([8, 69])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 73]) torch.Size([8, 73])\n", - "torch.Size([8, 75]) torch.Size([8, 75])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 74]) torch.Size([8, 74])\n", - "torch.Size([8, 82]) torch.Size([8, 82])\n", - "torch.Size([8, 68]) torch.Size([8, 68])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 59]) torch.Size([8, 59])\n", - "torch.Size([8, 59]) torch.Size([8, 59])\n", - "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 79]) torch.Size([8, 79])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 60]) torch.Size([8, 60])\n", - "torch.Size([8, 57]) torch.Size([8, 57])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 62]) torch.Size([8, 62])\n", - "torch.Size([8, 86]) torch.Size([8, 86])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 70]) torch.Size([8, 70])\n", - "torch.Size([8, 60]) torch.Size([8, 60])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 64]) torch.Size([8, 64])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 59]) torch.Size([8, 59])\n", - "torch.Size([8, 71]) torch.Size([8, 71])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 69]) torch.Size([8, 69])\n", - "torch.Size([8, 56]) torch.Size([8, 56])\n", - "torch.Size([8, 71]) torch.Size([8, 71])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 61]) torch.Size([8, 61])\n", - "torch.Size([8, 73]) torch.Size([8, 73])\n", - "torch.Size([8, 79]) torch.Size([8, 79])\n", - "torch.Size([8, 67]) torch.Size([8, 67])\n", - "torch.Size([8, 69]) torch.Size([8, 69])\n", - "torch.Size([8, 90]) torch.Size([8, 90])\n", - "torch.Size([8, 60]) torch.Size([8, 60])\n", - "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 79]) torch.Size([8, 79])\n", "torch.Size([8, 80]) torch.Size([8, 80])\n", - "torch.Size([8, 73]) torch.Size([8, 73])\n", - "torch.Size([8, 81]) torch.Size([8, 81])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", "torch.Size([8, 62]) torch.Size([8, 62])\n", - "torch.Size([8, 82]) torch.Size([8, 82])\n", + "torch.Size([8, 75]) torch.Size([8, 75])\n", + "torch.Size([8, 62]) torch.Size([8, 62])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", + "torch.Size([8, 77]) torch.Size([8, 77])\n", + "torch.Size([8, 69]) torch.Size([8, 69])\n", + "torch.Size([8, 79]) torch.Size([8, 79])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 83]) torch.Size([8, 83])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 80]) torch.Size([8, 80])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 69]) torch.Size([8, 69])\n", + "torch.Size([8, 65]) torch.Size([8, 65])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 60]) torch.Size([8, 60])\n", + "torch.Size([8, 59]) torch.Size([8, 59])\n", + "torch.Size([8, 69]) torch.Size([8, 69])\n", + "torch.Size([8, 63]) torch.Size([8, 63])\n", + "torch.Size([8, 65]) torch.Size([8, 65])\n", + "torch.Size([8, 76]) torch.Size([8, 76])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 91]) torch.Size([8, 91])\n", + "torch.Size([8, 65]) torch.Size([8, 65])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", "torch.Size([8, 67]) torch.Size([8, 67])\n", "torch.Size([8, 66]) torch.Size([8, 66])\n", - "torch.Size([8, 76]) torch.Size([8, 76])\n", - "torch.Size([8, 90]) torch.Size([8, 90])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", - "torch.Size([8, 60]) torch.Size([8, 60])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 65]) torch.Size([8, 65])\n", + "torch.Size([8, 75]) torch.Size([8, 75])\n", + "torch.Size([8, 89]) torch.Size([8, 89])\n", + "torch.Size([8, 59]) torch.Size([8, 59])\n", + "torch.Size([8, 88]) torch.Size([8, 88])\n", + "torch.Size([8, 83]) torch.Size([8, 83])\n", + "torch.Size([8, 83]) torch.Size([8, 83])\n", + "torch.Size([8, 70]) torch.Size([8, 70])\n", + "torch.Size([8, 65]) torch.Size([8, 65])\n", "torch.Size([8, 74]) torch.Size([8, 74])\n", + "torch.Size([8, 76]) torch.Size([8, 76])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", + "torch.Size([8, 75]) torch.Size([8, 75])\n", + "torch.Size([8, 83]) torch.Size([8, 83])\n", + "torch.Size([8, 69]) torch.Size([8, 69])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", + "torch.Size([8, 60]) torch.Size([8, 60])\n", + "torch.Size([8, 60]) torch.Size([8, 60])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 80]) torch.Size([8, 80])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 61]) torch.Size([8, 61])\n", + "torch.Size([8, 58]) torch.Size([8, 58])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", "torch.Size([8, 63]) torch.Size([8, 63])\n", + "torch.Size([8, 87]) torch.Size([8, 87])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 71]) torch.Size([8, 71])\n", + "torch.Size([8, 61]) torch.Size([8, 61])\n", "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 77]) torch.Size([8, 77])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 63]) torch.Size([8, 63])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 60]) torch.Size([8, 60])\n", + "torch.Size([8, 72]) torch.Size([8, 72])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 70]) torch.Size([8, 70])\n", + "torch.Size([8, 57]) torch.Size([8, 57])\n", + "torch.Size([8, 72]) torch.Size([8, 72])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 62]) torch.Size([8, 62])\n", + "torch.Size([8, 74]) torch.Size([8, 74])\n", + "torch.Size([8, 80]) torch.Size([8, 80])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 70]) torch.Size([8, 70])\n", + "torch.Size([8, 91]) torch.Size([8, 91])\n", + "torch.Size([8, 61]) torch.Size([8, 61])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 80]) torch.Size([8, 80])\n", + "torch.Size([8, 81]) torch.Size([8, 81])\n", + "torch.Size([8, 74]) torch.Size([8, 74])\n", "torch.Size([8, 82]) torch.Size([8, 82])\n", - "torch.Size([8, 65]) torch.Size([8, 65])\n", - "torch.Size([8, 73]) torch.Size([8, 73])\n", - "torch.Size([8, 68]) torch.Size([8, 68])\n" + "torch.Size([8, 63]) torch.Size([8, 63])\n", + "torch.Size([8, 83]) torch.Size([8, 83])\n", + "torch.Size([8, 68]) torch.Size([8, 68])\n", + "torch.Size([8, 67]) torch.Size([8, 67])\n", + "torch.Size([8, 77]) torch.Size([8, 77])\n", + "torch.Size([8, 91]) torch.Size([8, 91])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 61]) torch.Size([8, 61])\n", + "torch.Size([8, 75]) torch.Size([8, 75])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 78]) torch.Size([8, 78])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 64]) torch.Size([8, 64])\n", + "torch.Size([8, 83]) torch.Size([8, 83])\n", + "torch.Size([8, 66]) torch.Size([8, 66])\n", + "torch.Size([8, 74]) torch.Size([8, 74])\n", + "torch.Size([8, 69]) torch.Size([8, 69])\n" ] } ], @@ -875,21 +986,21 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 25, "id": "21b8fd02-014f-4481-9b71-5bfee8f9dfcd", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "21b8fd02-014f-4481-9b71-5bfee8f9dfcd", - "outputId": "71ce098a-36b7-44fa-8c7c-f63db448fe40" + "outputId": "cacf7f22-ec66-4350-8db4-890e7e86718f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "21106, 318, 281, 12064, 326, 8477, 257, 4876, 13, 19430, 257, 2882, 326, 20431, 32543, 262, 2581, 13, 198, 198, 21017, 46486, 25, 198, 30003, 6525, 262, 6827, 1262, 257, 985, 576, 13, 198, 198, 21017, 23412, 25, 198, 464, 5156, 318, 845, 13779, 13, 198, 198, 21017, 18261, 25, 198, 464, 5156, 318, 355, 13779, 355, 257, 4936, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, " + "21106, 318, 281, 12064, 326, 8477, 257, 4876, 13, 19430, 257, 2882, 326, 20431, 32543, 262, 2581, 13, 198, 198, 21017, 46486, 25, 198, 30003, 6525, 262, 6827, 1262, 257, 985, 576, 13, 198, 198, 21017, 23412, 25, 198, 464, 5156, 318, 845, 13779, 13, 198, 198, 21017, 18261, 25, 198, 464, 5156, 318, 355, 13779, 355, 257, 4936, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, " ] } ], @@ -900,21 +1011,21 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 26, "id": "51649ab4-1a7e-4a9e-92c5-950a24fde211", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "51649ab4-1a7e-4a9e-92c5-950a24fde211", - "outputId": "4cf98eac-b7f7-4687-b264-4508c0865865" + "outputId": "486fda24-80d4-4bc2-f253-2476f93cd146" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "318, 281, 12064, 326, 8477, 257, 4876, 13, 19430, 257, 2882, 326, 20431, 32543, 262, 2581, 13, 198, 198, 21017, 46486, 25, 198, 30003, 6525, 262, 6827, 1262, 257, 985, 576, 13, 198, 198, 21017, 23412, 25, 198, 464, 5156, 318, 845, 13779, 13, 198, 198, 21017, 18261, 25, 198, 464, 5156, 318, 355, 13779, 355, 257, 4936, 13, 50256, -100, -100, -100, -100, -100, -100, -100, -100, " + "318, 281, 12064, 326, 8477, 257, 4876, 13, 19430, 257, 2882, 326, 20431, 32543, 262, 2581, 13, 198, 198, 21017, 46486, 25, 198, 30003, 6525, 262, 6827, 1262, 257, 985, 576, 13, 198, 198, 21017, 23412, 25, 198, 464, 5156, 318, 845, 13779, 13, 198, 198, 21017, 18261, 25, 198, 464, 5156, 318, 355, 13779, 355, 257, 4936, 13, 50256, -100, -100, -100, -100, -100, -100, -100, -100, -100, " ] } ], @@ -935,27 +1046,27 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 27, "id": "0d249d67-5eba-414e-9bd2-972ebf01329d", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0d249d67-5eba-414e-9bd2-972ebf01329d", - "outputId": "0ccd8d13-4f8a-44ce-ea22-0b6ea36bb06e" + "outputId": "ca78e098-c253-4bbe-ebb5-6fd018d8e037" }, "outputs": [ { - "name": "stdout", + "name": "stderr", "output_type": "stream", "text": [ - "File already exists and is up-to-date: gpt2/355M/checkpoint\n", - "File already exists and is up-to-date: gpt2/355M/encoder.json\n", - "File already exists and is up-to-date: gpt2/355M/hparams.json\n", - "File already exists and is up-to-date: gpt2/355M/model.ckpt.data-00000-of-00001\n", - "File already exists and is up-to-date: gpt2/355M/model.ckpt.index\n", - "File already exists and is up-to-date: gpt2/355M/model.ckpt.meta\n", - "File already exists and is up-to-date: gpt2/355M/vocab.bpe\n" + "checkpoint: 100%|██████████| 77.0/77.0 [00:00<00:00, 116kiB/s]\n", + "encoder.json: 100%|██████████| 1.04M/1.04M [00:02<00:00, 509kiB/s]\n", + "hparams.json: 100%|██████████| 91.0/91.0 [00:00<00:00, 138kiB/s]\n", + "model.ckpt.data-00000-of-00001: 100%|██████████| 1.42G/1.42G [02:49<00:00, 8.38MiB/s]\n", + "model.ckpt.index: 100%|██████████| 10.4k/10.4k [00:00<00:00, 13.8MiB/s]\n", + "model.ckpt.meta: 100%|██████████| 927k/927k [00:02<00:00, 454kiB/s]\n", + "vocab.bpe: 100%|██████████| 456k/456k [00:01<00:00, 321kiB/s]\n" ] } ], @@ -992,14 +1103,14 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 28, "id": "7bd32b7c-5b44-4d25-a09f-46836802ca74", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7bd32b7c-5b44-4d25-a09f-46836802ca74", - "outputId": "de446b9d-7667-48a5-c34a-f3c5cf70459b" + "outputId": "e5dbf217-591c-4c2e-9ec2-ef5365fa269e" }, "outputs": [ { @@ -1052,7 +1163,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 29, "id": "65444865-df87-4d98-9faf-875e1c4be860", "metadata": { "id": "65444865-df87-4d98-9faf-875e1c4be860" @@ -1067,22 +1178,22 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 30, "id": "d99fc6f8-63b2-43da-adbb-a7b6b92c8dd5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d99fc6f8-63b2-43da-adbb-a7b6b92c8dd5", - "outputId": "0c815e75-9357-42e6-fdf3-3ea13ffa4da4" + "outputId": "a4d82a24-f16e-4cf7-ebe6-0bff051517a1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Training loss: 3.8234103202819822\n", - "Validation loss: 3.7612109184265137\n" + "Training loss: 3.8259091854095457\n", + "Validation loss: 3.7619335651397705\n" ] } ], @@ -1102,7 +1213,9 @@ { "cell_type": "markdown", "id": "db4b57fb-e689-4550-931c-6d34a932487c", - "metadata": {}, + "metadata": { + "id": "db4b57fb-e689-4550-931c-6d34a932487c" + }, "source": [ "- Runtimes:\n", "\n", @@ -1111,105 +1224,106 @@ "| Model | Platform | Runtime |\n", "|--------------------|-----------------------|----------------|\n", "| gpt2-medium (355M) | CPU (M3 MacBook Air) | 23.67 minutes |\n", + "| gpt2-medium (355M) | GPU (L4) | 2.98 minutes |\n", "| gpt2-medium (355M) | GPU (A100) | 1.29 minutes |\n", - "| gpt2-small (124M) | CPU (M3 MacBook Air) | 8.61 |\n", + "| gpt2-small (124M) | CPU (M3 MacBook Air) | 8.61 minutes |\n", "| gpt2-small (124M) | GPU (A100) | 0.59 minutes |\n", "\n", "\n", "\n", - "- Remainder of the notebook was run on M3 MacBook Air with the `\"gpt2-medium (355M)\"` model" + "- This notebook was run with the `\"gpt2-medium (355M)\"` model" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 31, "id": "78bcf83a-1fff-4540-97c1-765c4016d5e3", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "78bcf83a-1fff-4540-97c1-765c4016d5e3", - "outputId": "315368d9-5484-4527-f42d-b0d650d6aa23" + "outputId": "285ca27c-019f-4c2b-e130-8c46d2e7df53" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Ep 1 (Step 000000): Train loss 2.636, Val loss 2.627\n", - "Ep 1 (Step 000005): Train loss 1.173, Val loss 1.103\n", - "Ep 1 (Step 000010): Train loss 0.873, Val loss 0.947\n", - "Ep 1 (Step 000015): Train loss 0.856, Val loss 0.907\n", - "Ep 1 (Step 000020): Train loss 0.777, Val loss 0.882\n", - "Ep 1 (Step 000025): Train loss 0.754, Val loss 0.860\n", - "Ep 1 (Step 000030): Train loss 0.799, Val loss 0.838\n", - "Ep 1 (Step 000035): Train loss 0.715, Val loss 0.810\n", - "Ep 1 (Step 000040): Train loss 0.673, Val loss 0.807\n", - "Ep 1 (Step 000045): Train loss 0.634, Val loss 0.791\n", - "Ep 1 (Step 000050): Train loss 0.663, Val loss 0.784\n", - "Ep 1 (Step 000055): Train loss 0.760, Val loss 0.764\n", - "Ep 1 (Step 000060): Train loss 0.721, Val loss 0.745\n", - "Ep 1 (Step 000065): Train loss 0.654, Val loss 0.736\n", - "Ep 1 (Step 000070): Train loss 0.535, Val loss 0.730\n", - "Ep 1 (Step 000075): Train loss 0.569, Val loss 0.729\n", - "Ep 1 (Step 000080): Train loss 0.606, Val loss 0.726\n", - "Ep 1 (Step 000085): Train loss 0.511, Val loss 0.710\n", - "Ep 1 (Step 000090): Train loss 0.563, Val loss 0.691\n", - "Ep 1 (Step 000095): Train loss 0.501, Val loss 0.682\n", - "Ep 1 (Step 000100): Train loss 0.504, Val loss 0.678\n", - "Ep 1 (Step 000105): Train loss 0.566, Val loss 0.671\n", - "Ep 1 (Step 000110): Train loss 0.556, Val loss 0.668\n", - "Ep 1 (Step 000115): Train loss 0.509, Val loss 0.665\n", + "Ep 1 (Step 000000): Train loss 2.637, Val loss 2.626\n", + "Ep 1 (Step 000005): Train loss 1.174, Val loss 1.102\n", + "Ep 1 (Step 000010): Train loss 0.872, Val loss 0.944\n", + "Ep 1 (Step 000015): Train loss 0.857, Val loss 0.906\n", + "Ep 1 (Step 000020): Train loss 0.776, Val loss 0.881\n", + "Ep 1 (Step 000025): Train loss 0.754, Val loss 0.859\n", + "Ep 1 (Step 000030): Train loss 0.799, Val loss 0.836\n", + "Ep 1 (Step 000035): Train loss 0.714, Val loss 0.808\n", + "Ep 1 (Step 000040): Train loss 0.672, Val loss 0.806\n", + "Ep 1 (Step 000045): Train loss 0.633, Val loss 0.789\n", + "Ep 1 (Step 000050): Train loss 0.663, Val loss 0.783\n", + "Ep 1 (Step 000055): Train loss 0.760, Val loss 0.763\n", + "Ep 1 (Step 000060): Train loss 0.719, Val loss 0.743\n", + "Ep 1 (Step 000065): Train loss 0.653, Val loss 0.735\n", + "Ep 1 (Step 000070): Train loss 0.532, Val loss 0.729\n", + "Ep 1 (Step 000075): Train loss 0.569, Val loss 0.728\n", + "Ep 1 (Step 000080): Train loss 0.605, Val loss 0.725\n", + "Ep 1 (Step 000085): Train loss 0.509, Val loss 0.709\n", + "Ep 1 (Step 000090): Train loss 0.562, Val loss 0.691\n", + "Ep 1 (Step 000095): Train loss 0.500, Val loss 0.681\n", + "Ep 1 (Step 000100): Train loss 0.503, Val loss 0.677\n", + "Ep 1 (Step 000105): Train loss 0.564, Val loss 0.670\n", + "Ep 1 (Step 000110): Train loss 0.555, Val loss 0.666\n", + "Ep 1 (Step 000115): Train loss 0.508, Val loss 0.664\n", "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: The meal is prepared every day by the chef.<|endoftext|>The following is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive:\n", - "Ep 2 (Step 000120): Train loss 0.436, Val loss 0.672\n", + "Ep 2 (Step 000120): Train loss 0.435, Val loss 0.672\n", "Ep 2 (Step 000125): Train loss 0.451, Val loss 0.687\n", - "Ep 2 (Step 000130): Train loss 0.447, Val loss 0.682\n", - "Ep 2 (Step 000135): Train loss 0.405, Val loss 0.681\n", - "Ep 2 (Step 000140): Train loss 0.407, Val loss 0.680\n", - "Ep 2 (Step 000145): Train loss 0.370, Val loss 0.681\n", - "Ep 2 (Step 000150): Train loss 0.382, Val loss 0.676\n", - "Ep 2 (Step 000155): Train loss 0.413, Val loss 0.676\n", - "Ep 2 (Step 000160): Train loss 0.414, Val loss 0.685\n", - "Ep 2 (Step 000165): Train loss 0.379, Val loss 0.688\n", - "Ep 2 (Step 000170): Train loss 0.322, Val loss 0.683\n", - "Ep 2 (Step 000175): Train loss 0.338, Val loss 0.670\n", - "Ep 2 (Step 000180): Train loss 0.393, Val loss 0.659\n", - "Ep 2 (Step 000185): Train loss 0.417, Val loss 0.659\n", - "Ep 2 (Step 000190): Train loss 0.342, Val loss 0.649\n", - "Ep 2 (Step 000195): Train loss 0.330, Val loss 0.635\n", - "Ep 2 (Step 000200): Train loss 0.312, Val loss 0.634\n", - "Ep 2 (Step 000205): Train loss 0.355, Val loss 0.630\n", - "Ep 2 (Step 000210): Train loss 0.371, Val loss 0.629\n", - "Ep 2 (Step 000215): Train loss 0.394, Val loss 0.633\n", - "Ep 2 (Step 000220): Train loss 0.302, Val loss 0.646\n", - "Ep 2 (Step 000225): Train loss 0.344, Val loss 0.659\n", - "Ep 2 (Step 000230): Train loss 0.292, Val loss 0.656\n", - "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: The meal is cooked everyday by the chef.<|endoftext|>The following is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is the capital of the United Kingdom\n", - "Ep 3 (Step 000235): Train loss 0.327, Val loss 0.663\n", - "Ep 3 (Step 000240): Train loss 0.275, Val loss 0.693\n", - "Ep 3 (Step 000245): Train loss 0.275, Val loss 0.707\n", - "Ep 3 (Step 000250): Train loss 0.246, Val loss 0.698\n", - "Ep 3 (Step 000255): Train loss 0.277, Val loss 0.688\n", - "Ep 3 (Step 000260): Train loss 0.268, Val loss 0.687\n", - "Ep 3 (Step 000265): Train loss 0.269, Val loss 0.694\n", - "Ep 3 (Step 000270): Train loss 0.282, Val loss 0.707\n", - "Ep 3 (Step 000275): Train loss 0.275, Val loss 0.701\n", - "Ep 3 (Step 000280): Train loss 0.293, Val loss 0.709\n", - "Ep 3 (Step 000285): Train loss 0.291, Val loss 0.711\n", - "Ep 3 (Step 000290): Train loss 0.288, Val loss 0.710\n", - "Ep 3 (Step 000295): Train loss 0.268, Val loss 0.703\n", - "Ep 3 (Step 000300): Train loss 0.262, Val loss 0.691\n", - "Ep 3 (Step 000305): Train loss 0.268, Val loss 0.688\n", - "Ep 3 (Step 000310): Train loss 0.270, Val loss 0.692\n", - "Ep 3 (Step 000315): Train loss 0.234, Val loss 0.697\n", - "Ep 3 (Step 000320): Train loss 0.252, Val loss 0.696\n", - "Ep 3 (Step 000325): Train loss 0.235, Val loss 0.701\n", - "Ep 3 (Step 000330): Train loss 0.239, Val loss 0.697\n", - "Ep 3 (Step 000335): Train loss 0.229, Val loss 0.687\n", - "Ep 3 (Step 000340): Train loss 0.246, Val loss 0.684\n", - "Ep 3 (Step 000345): Train loss 0.243, Val loss 0.676\n", + "Ep 2 (Step 000130): Train loss 0.447, Val loss 0.683\n", + "Ep 2 (Step 000135): Train loss 0.405, Val loss 0.682\n", + "Ep 2 (Step 000140): Train loss 0.409, Val loss 0.681\n", + "Ep 2 (Step 000145): Train loss 0.369, Val loss 0.680\n", + "Ep 2 (Step 000150): Train loss 0.382, Val loss 0.675\n", + "Ep 2 (Step 000155): Train loss 0.413, Val loss 0.675\n", + "Ep 2 (Step 000160): Train loss 0.415, Val loss 0.683\n", + "Ep 2 (Step 000165): Train loss 0.379, Val loss 0.686\n", + "Ep 2 (Step 000170): Train loss 0.323, Val loss 0.681\n", + "Ep 2 (Step 000175): Train loss 0.337, Val loss 0.669\n", + "Ep 2 (Step 000180): Train loss 0.392, Val loss 0.657\n", + "Ep 2 (Step 000185): Train loss 0.415, Val loss 0.657\n", + "Ep 2 (Step 000190): Train loss 0.340, Val loss 0.648\n", + "Ep 2 (Step 000195): Train loss 0.329, Val loss 0.635\n", + "Ep 2 (Step 000200): Train loss 0.310, Val loss 0.635\n", + "Ep 2 (Step 000205): Train loss 0.352, Val loss 0.631\n", + "Ep 2 (Step 000210): Train loss 0.367, Val loss 0.630\n", + "Ep 2 (Step 000215): Train loss 0.396, Val loss 0.634\n", + "Ep 2 (Step 000220): Train loss 0.300, Val loss 0.647\n", + "Ep 2 (Step 000225): Train loss 0.347, Val loss 0.660\n", + "Ep 2 (Step 000230): Train loss 0.294, Val loss 0.655\n", + "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: The meal is cooked every day by the chef.<|endoftext|>The following is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is the capital of the United Kingdom\n", + "Ep 3 (Step 000235): Train loss 0.328, Val loss 0.661\n", + "Ep 3 (Step 000240): Train loss 0.280, Val loss 0.692\n", + "Ep 3 (Step 000245): Train loss 0.274, Val loss 0.702\n", + "Ep 3 (Step 000250): Train loss 0.248, Val loss 0.691\n", + "Ep 3 (Step 000255): Train loss 0.275, Val loss 0.680\n", + "Ep 3 (Step 000260): Train loss 0.266, Val loss 0.683\n", + "Ep 3 (Step 000265): Train loss 0.274, Val loss 0.701\n", + "Ep 3 (Step 000270): Train loss 0.280, Val loss 0.715\n", + "Ep 3 (Step 000275): Train loss 0.276, Val loss 0.705\n", + "Ep 3 (Step 000280): Train loss 0.296, Val loss 0.710\n", + "Ep 3 (Step 000285): Train loss 0.294, Val loss 0.714\n", + "Ep 3 (Step 000290): Train loss 0.287, Val loss 0.717\n", + "Ep 3 (Step 000295): Train loss 0.267, Val loss 0.711\n", + "Ep 3 (Step 000300): Train loss 0.271, Val loss 0.694\n", + "Ep 3 (Step 000305): Train loss 0.277, Val loss 0.686\n", + "Ep 3 (Step 000310): Train loss 0.276, Val loss 0.689\n", + "Ep 3 (Step 000315): Train loss 0.238, Val loss 0.688\n", + "Ep 3 (Step 000320): Train loss 0.255, Val loss 0.691\n", + "Ep 3 (Step 000325): Train loss 0.235, Val loss 0.693\n", + "Ep 3 (Step 000330): Train loss 0.233, Val loss 0.696\n", + "Ep 3 (Step 000335): Train loss 0.224, Val loss 0.698\n", + "Ep 3 (Step 000340): Train loss 0.243, Val loss 0.687\n", + "Ep 3 (Step 000345): Train loss 0.244, Val loss 0.675\n", "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: The chef cooks the meal every day.<|endoftext|>The following is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is the capital of the United Kingdom? \n", - "Training completed in 23.67 minutes.\n" + "Training completed in 2.98 minutes.\n" ] } ], @@ -1237,20 +1351,20 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 32, "id": "1Vdh7jmHI1we", "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 307 + "height": 308 }, "id": "1Vdh7jmHI1we", - "outputId": "97990a7a-605b-4634-9c6f-085d800eed71" + "outputId": "475faf7f-13e6-4168-84f2-3eb3897ffd73" }, "outputs": [ { "data": { - "image/png": "", + "image/png": "\n", "text/plain": [ "
" ] @@ -1269,14 +1383,16 @@ { "cell_type": "markdown", "id": "87b79a47-13f9-4d1f-87b1-3339bafaf2a3", - "metadata": {}, + "metadata": { + "id": "87b79a47-13f9-4d1f-87b1-3339bafaf2a3" + }, "source": [ "## 7.6 Extracting and saving responses" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 33, "id": "F9QyvnRipwNc", "metadata": { "id": "F9QyvnRipwNc" @@ -1289,14 +1405,14 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 34, "id": "VQ2NZMbfucAc", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VQ2NZMbfucAc", - "outputId": "4a014e82-0741-4807-a77c-05b770940dd8" + "outputId": "1fd28d43-3fd4-4d94-a63e-07f4a53f41b6" }, "outputs": [ { @@ -1326,7 +1442,7 @@ ">> The type of cloud typically associated with thunderstorms is cumulonimbus.\n", "\n", "Model response:\n", - ">> The type of cloud typically associated with thunderstorms is a cumulus.\n", + ">> The type of cloud typically associated with thunderstorms is a cumulus (thin, water-filled, or gas-filled).\n", "-------------------------------------\n", "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n", "\n", @@ -1367,21 +1483,21 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 35, "id": "-PNGKzY4snKP", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-PNGKzY4snKP", - "outputId": "b065c0e6-a3b3-4e70-bbfd-17ff69ad317f" + "outputId": "3e16caff-287a-4084-ed93-fcccd68e1da7" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "100%|█████████████████████████████████████████| 110/110 [06:24<00:00, 3.50s/it]\n" + "100%|██████████| 110/110 [01:17<00:00, 1.42it/s]\n" ] } ], @@ -1411,14 +1527,14 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 36, "id": "u-AvCCMTnPSE", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "u-AvCCMTnPSE", - "outputId": "6968bb22-04e5-4473-90bc-4ed6af6aa0cf" + "outputId": "90c7f165-713e-4795-9205-f2f9b4d13313" }, "outputs": [ { @@ -1430,7 +1546,7 @@ " 'model_response': 'The car is as fast as a bullet.'}" ] }, - "execution_count": 32, + "execution_count": 36, "metadata": {}, "output_type": "execute_result" } @@ -1441,10 +1557,14 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 37, "id": "8cBU0iHmVfOI", "metadata": { - "id": "8cBU0iHmVfOI" + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8cBU0iHmVfOI", + "outputId": "df6e862f-a6c8-4d23-ac3a-7645fd25a59d" }, "outputs": [ { @@ -1475,15 +1595,18 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 1, "id": "026e8570-071e-48a2-aa38-64d7be35f288", - "metadata": {}, + "metadata": { + "id": "026e8570-071e-48a2-aa38-64d7be35f288", + "outputId": "ad2e3f89-30a0-4f8b-9d6f-24acf6cf5153" + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "True\n" + "Ollama running: True\n" ] } ], @@ -1502,31 +1625,62 @@ "\n", "if not ollama_running:\n", " raise RuntimeError(\"Ollama not running. Launch ollama before proceeding.\")\n", - "print(check_if_running(\"ollama\"))" + "print(\"Ollama running:\", check_if_running(\"ollama\"))" ] }, { "cell_type": "code", - "execution_count": 35, - "id": "e3ae0e10-2b28-42ce-8ea2-d9366a58088f", + "execution_count": 2, + "id": "723c9b00-e3cd-4092-83c3-6e48b5cf65b0", "metadata": {}, + "outputs": [], + "source": [ + "# This cell is optional; it allows you to restart the notebook \n", + "# and only run section 7.7 without rerunning any of the previous cod\n", + "import json \n", + "from tqdm import tqdm\n", + "\n", + "file_path = \"instruction-data-with-response.json\"\n", + "\n", + "with open(file_path, \"r\") as file:\n", + " test_data = json.load(file)\n", + "\n", + "\n", + "def format_input(entry):\n", + " instruction_text = (\n", + " f\"Below is an instruction that describes a task. \"\n", + " f\"Write a response that appropriately completes the request.\"\n", + " f\"\\n\\n### Instruction:\\n{entry['instruction']}\"\n", + " )\n", + "\n", + " input_text = f\"\\n\\n### Input:\\n{entry['input']}\" if entry[\"input\"] else \"\"\n", + "\n", + " return instruction_text + input_text" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e3ae0e10-2b28-42ce-8ea2-d9366a58088f", + "metadata": { + "id": "e3ae0e10-2b28-42ce-8ea2-d9366a58088f", + "outputId": "9ca4ec2b-09d2-4447-da42-c1b81b93333a" + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Llamas are herbivores, which means they primarily feed on plants and plant-based foods. Their diet typically consists of:\n", + "Llamas are ruminant animals, which means they have a four-chambered stomach and feed on plant-based foods. Their diet typically consists of:\n", "\n", - "1. Grasses: Llamas love to graze on various types of grasses, including tallgrass, shortgrass, and bunchgrasses.\n", - "2. Leaves: They enjoy munching on leaves from trees and shrubs, such as oak, maple, and willow.\n", - "3. Fruits: Llamas enjoy fruits like apples, berries, and melons.\n", - "4. Hay: A good quality hay, such as timothy or alfalfa, is often provided as a staple in their diet.\n", - "5. Grains: Whole grains like oats, barley, and corn can be offered as treats or as part of their regular feed.\n", - "6. Supplements: In some cases, llama owners may choose to add commercial supplements or mineral blocks to ensure the animal is getting all the necessary nutrients.\n", + "1. Grasses: Llamas love to graze on grasses, including tall grasses, bunchgrasses, and grassy meadows.\n", + "2. Hay: High-quality hay is a staple in many llama diets. Timothy hay, alfalfa hay, and oat hay are all popular choices.\n", + "3. Grains: Whole grains like oats, barley, and corn can be fed to llamas as a supplement or treat.\n", + "4. Leaves: Llamas enjoy munching on leaves from trees and shrubs, such as willow, cottonwood, and juniper.\n", + "5. Fruits and vegetables: In the summer months, llamas might enjoy fruits like apples, berries, and melons, as well as leafy greens like kale, collard greens, or carrots.\n", + "6. Pellets: A high-fiber pellet specifically formulated for llamas can be a convenient and nutritious addition to their diet.\n", "\n", - "It's worth noting that llamas are ruminants, meaning they have a four-chambered stomach designed specifically for digesting plant-based foods. Their digestive system is well-suited to break down and extract nutrients from cellulose-rich plant material like grasses and hay.\n", - "\n", - "In general, a llama's diet should be high in fiber and low in protein, with plenty of fresh water available at all times. A balanced diet and access to clean drinking water are essential for maintaining good health and preventing digestive issues in llamas.\n" + "It's essential to provide llamas with access to fresh water at all times and ensure they have a reliable source of fiber-rich foods to maintain their digestive health. Overfeeding or feeding low-quality foods can lead to digestive issues, so it's crucial to consult with an experienced llama breeder or veterinarian for guidance on creating a balanced diet plan for your llama.\n" ] } ], @@ -1573,16 +1727,21 @@ { "cell_type": "markdown", "id": "207ae28f-0f8c-4fda-aeef-e7e3046249cc", - "metadata": {}, + "metadata": { + "id": "207ae28f-0f8c-4fda-aeef-e7e3046249cc" + }, "source": [ "- Using ollama with the `\"llama3\"` model (a 8B parameter model) requires 16 GB of RAM; if this is not supported by your machine, you can try the smaller model, such as the 3.8B parameter phi-3 model by setting `model = \"phi-3\"`, which only requires 8 Gb of RAM" ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 4, "id": "86b839d4-064d-4178-b2d7-01691b452e5e", - "metadata": {}, + "metadata": { + "id": "86b839d4-064d-4178-b2d7-01691b452e5e", + "outputId": "6c003d5f-65e3-4316-861b-c35bae6b2ca7" + }, "outputs": [ { "name": "stdout", @@ -1596,17 +1755,15 @@ ">> The car is as fast as a bullet.\n", "\n", "Score:\n", - ">> A fun task!\n", + ">> To evaluate the model's response, I'll consider the following factors:\n", "\n", - "To score this response, I'll consider the following factors:\n", + "1. Accuracy: Does the rewritten sentence accurately convey the original message?\n", + "2. Creativity: Is the chosen analogy unique and engaging?\n", + "3. Relevance: Is the comparison relevant to the original sentence?\n", "\n", - "1. Grammar and syntax: The sentence is grammatically correct.\n", - "2. Simile quality: A bullet is a relatively fast-moving object, making it a decent comparison for a fast car.\n", - "3. Originality: While not extremely original, the comparison to a bullet is a common simile used to describe speed.\n", + "The model's response, \"The car is as fast as a bullet,\" scores high in accuracy (it conveys the idea that the car is very fast) and creativity (using a bullet as an analogy is unexpected). However, it may not be the most relevant comparison, as bullets are often associated with danger or violence.\n", "\n", - "Score: 85\n", - "\n", - "Reasoning: The response is good but not outstanding. Using a bullet as a simile for speed is a classic and understandable choice. However, it's not particularly creative or surprising, which is why I wouldn't give it a perfect score of 100. Overall, the response effectively completes the instruction and conveys the idea that the car is fast.\n", + "Using these criteria, I'd score the model's response around 85 out of 100. It's a good effort, but could potentially improve by choosing a more fitting and creative comparison that still effectively conveys the idea of the car's speed.\n", "\n", "-------------------------\n", "\n", @@ -1614,30 +1771,14 @@ ">> The type of cloud typically associated with thunderstorms is cumulonimbus.\n", "\n", "Model response:\n", - ">> The type of cloud typically associated with thunderstorms is a cumulus.\n", + ">> The type of cloud typically associated with thunderstorms is a cumulus (thin, water-filled, or gas-filled).\n", "\n", "Score:\n", - ">> A nice evaluation!\n", + ">> To evaluate the model's response, I'll consider its accuracy and completeness in addressing the original instruction.\n", "\n", - "Let's compare the model response to the correct output:\n", + "The model's response partially addresses the instruction by mentioning that cumulus clouds are associated with thunderstorms. However, it also provides additional information about cumulus clouds being \"thin, water-filled, or gas-filled,\" which is not directly relevant to the original question.\n", "\n", - "Model Response: \"The type of cloud typically associated with thunderstorms is a cumulus.\"\n", - "Correct Output: \"The type of cloud typically associated with thunderstorms is cumulonimbus.\"\n", - "\n", - "To score the model response, I'll consider the following factors:\n", - "\n", - "1. Accuracy: The model response is close but not entirely accurate. Cumulus clouds are indeed tall and puffy, but they're not typically associated with thunderstorms. Cumulonimbus clouds are the ones commonly linked to severe weather.\n", - "Score: 60/100 (it's a good guess, but not precise)\n", - "\n", - "2. Relevance: The model response is somewhat relevant to the question. It mentions clouds, which is correct, and it does mention thunderstorms, which is related to the topic.\n", - "Score: 40/100 (it's on the right track, but not entirely focused)\n", - "\n", - "3. Clarity: The model response is clear and easy to understand.\n", - "Score: 80/100 (good job on that front!)\n", - "\n", - "Overall Score: (60 + 40 + 80) / 3 = 66.67\n", - "\n", - "I'd give the model response a score of **66** out of 100. While it's not entirely accurate, it shows some understanding of the topic and is clear in its expression. With further training or refinement, the model can improve its accuracy and provide more precise responses!\n", + "Given these factors, I would score the model's response as 60 out of 100. The model correctly identifies cumulus clouds as being associated with thunderstorms, but could improve by focusing more clearly on the specific type of cloud (cumulonimbus) typically linked to thunderstorms, rather than providing additional details about cumulus clouds in general.\n", "\n", "-------------------------\n", "\n", @@ -1648,14 +1789,25 @@ ">> The author of 'Pride and Prejudice' is Jane Austen.\n", "\n", "Score:\n", - ">> Based on the input and expected output, I would respond as follows:\n", + ">> A simple one!\n", "\n", - "### Model Response:\n", - "The author of 'Pride and Prejudice' is Jane Austen.\n", + "The input instruction asks me to \"Name the author of 'Pride and Prejudice'.\"\n", "\n", - "**Score:** 100/100\n", + "My response: `Jane Austen.`\n", "\n", - "Reasoning: The model response accurately completes the instruction by stating the correct author of the novel \"Pride and Prejudice\", which is indeed Jane Austen. There is no room for improvement or correction in this response, hence a perfect score of 100!\n", + "And that's correct! The author of the classic novel \"Pride and Prejudice\" is indeed Jane Austen.\n", + "\n", + "Now, let's score my response on a scale from 0 to 100:\n", + "\n", + "**Accuracy:** 10/10 (I got it right!)\n", + "\n", + "**Clarity:** 9/10 (My response was brief and to the point.)\n", + "\n", + "**Relevance:** 10/10 (The answer is directly related to the question.)\n", + "\n", + "**Overall:** 92/100\n", + "\n", + "So, my score for this response is a solid 92 out of 100!\n", "\n", "-------------------------\n" ] @@ -1680,15 +1832,18 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 5, "id": "9d7bca69-97c4-47a5-9aa0-32f116fa37eb", - "metadata": {}, + "metadata": { + "id": "9d7bca69-97c4-47a5-9aa0-32f116fa37eb", + "outputId": "bf585ec4-0f49-4bc7-89e3-6b47828ac6d4" + }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "Scoring entries: 100%|████████████████████████| 110/110 [00:46<00:00, 2.39it/s]" + "Scoring entries: 100%|████████████████████████| 110/110 [01:11<00:00, 1.55it/s]" ] }, { @@ -1696,7 +1851,7 @@ "output_type": "stream", "text": [ "Number of scores: 110 of 110\n", - "Average score: 48.98\n", + "Average score: 52.88\n", "\n" ] }, @@ -1736,8 +1891,10 @@ }, { "cell_type": "markdown", - "id": "e95ac0db-aa58-43eb-8e6e-6ea8ae798299", - "metadata": {}, + "id": "6408768b-2784-44f1-b48e-aed0c1eb9b94", + "metadata": { + "id": "6408768b-2784-44f1-b48e-aed0c1eb9b94" + }, "source": [ "- For reference, the original\n", " - Llama 3 8B base model achieves a score of 58.51\n", @@ -1748,7 +1905,7 @@ "cell_type": "markdown", "id": "412d7325-284a-446c-92a1-5aa8acc52dee", "metadata": { - "id": "xczdTl40ajob" + "id": "412d7325-284a-446c-92a1-5aa8acc52dee" }, "source": [ "## 7.8 Conclusions" @@ -1757,7 +1914,9 @@ { "cell_type": "markdown", "id": "f9853e7f-a81a-4806-9728-be1690807185", - "metadata": {}, + "metadata": { + "id": "f9853e7f-a81a-4806-9728-be1690807185" + }, "source": [ "## Summary" ]