diff --git a/README.md b/README.md index 7cb3bcc..8045fd6 100644 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ Alternatively, you can view this and other files on GitHub at [https://github.co | Appendix B: References and Further Reading | No code | - | | Appendix C: Exercise Solutions | No code | - | | Appendix D: Adding Bells and Whistles to the Training Loop | - [appendix-D.ipynb](appendix-D/01_main-chapter-code/appendix-D.ipynb) | [./appendix-D](./appendix-D) | -| Appendix E: Parameter-efficient Finetuning with LoRA | - Q2 2024 | ... | +| Appendix E: Parameter-efficient Finetuning with LoRA | - [appendix-E.ipynb](appendix-E/01_main-chapter-code/appendix-E.ipynb) | [./appendix-E](./appendix-E) | diff --git a/appendix-E/01_main-chapter-code/appendix-E.ipynb b/appendix-E/01_main-chapter-code/appendix-E.ipynb new file mode 100644 index 0000000..13d546d --- /dev/null +++ b/appendix-E/01_main-chapter-code/appendix-E.ipynb @@ -0,0 +1,1408 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c024bfa4-1a7a-4751-b5a1-827225a3478b", + "metadata": { + "id": "c024bfa4-1a7a-4751-b5a1-827225a3478b" + }, + "source": [ + "\n", + "Supplementary code for \"Build a Large Language Model From Scratch\": https://www.manning.com/books/build-a-large-language-model-from-scratch by Sebastian Raschka
\n", + "Code repository: https://github.com/rasbt/LLMs-from-scratch\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "58b8c870-fb72-490e-8916-d8129bd5d1ff", + "metadata": {}, + "source": [ + "# Appendix E: Parameter-efficient Finetuning with LoRA" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5b7e01c2-1c84-4f2a-bb51-2e0b74abda90", + "outputId": "9495f150-9d79-4910-d6e7-6c0d9aae4a41" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "matplotlib version: 3.7.2\n", + "numpy version: 1.25.2\n", + "tiktoken version: 0.5.1\n", + "torch version: 2.2.2\n", + "tensorflow version: 2.15.0\n", + "pandas version: 2.0.3\n" + ] + } + ], + "source": [ + "from importlib.metadata import version\n", + "\n", + "pkgs = [\"matplotlib\",\n", + " \"numpy\",\n", + " \"tiktoken\",\n", + " \"torch\",\n", + " \"tensorflow\", # For OpenAI's pretrained weights\n", + " \"pandas\" # Dataset loading\n", + " ]\n", + "for p in pkgs:\n", + " print(f\"{p} version: {version(p)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "21532056-0ef4-4c98-82c7-e91f61c6485e", + "metadata": {}, + "source": [ + "## E.1 Introduction to LoRA" + ] + }, + { + "cell_type": "markdown", + "id": "66edc999-3d91-4a1c-a157-9d056392e8d8", + "metadata": {}, + "source": [ + "- No code in this section\n", + "- Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model to better suit a specific, often smaller, dataset by adjusting only a small, low-rank subset of the model's parameters\n", + "- This approach is important because it allows for efficient finetuning of large models on task-specific data, significantly reducing the computational cost and time required for finetuning" + ] + }, + { + "cell_type": "markdown", + "id": "5bb75b5d-d59c-4948-821a-1594a5883dc1", + "metadata": {}, + "source": [ + "- Suppose we have a large weight matrix $W$ for a given layer\n", + "- During backpropagation, we learn a $\\Delta W$ matrix, which contains information on how much we want to update the original weights to minimize the loss function during training\n", + "- In regular training and finetuning, the weight update is defined as follows:\n", + "\n", + "$$W_{\\text{updated}} = W + \\Delta W$$\n", + "\n", + "- The LoRA method proposed by [Hu et al.](https://arxiv.org/abs/2106.09685) offers a more efficient alternative to computing the weight updates $\\Delta W$ by learning an approximation of it, $\\Delta W \\approx AB$.\n", + "- In other words, in LoRA, we have the following, where $A$ and $B$ are two small weight matrices:\n", + "\n", + "$$W_{\\text{updated}} = W + AB$$\n", + "\n", + "- The figure below illustrates these formulas for full finetuning and LoRA side by side\n", + "\n", + "\n", + "\n", + "- If you paid close attention, the full finetuning and LoRA depictions in the figure above look slightly different from the formulas I have shown earlier\n", + "- That's due to the distributive law of matrix multiplication: we don't have to add the weights with the updated weights but can keep them separate\n", + "- For instance, if $x$ is the input data, then we can write the following for regular finetuning:\n", + "\n", + "$$x (W+\\Delta W) = x W + x \\Delta W$$\n", + "\n", + "- Similarly, we can write the following for LoRA:\n", + "\n", + "$$x (W+A B) = x W + x A B$$\n", + "\n", + "- The fact that we can keep the LoRA weight matrices separate makes LoRA especially attractive\n", + "- In practice, this means that we don't have to modify the weights of the pretrained model at all, as we can apply the LoRA matrices on the fly\n", + "- After setting up the dataset and loading the model, we we will implement LoRA in code to make these concepts less abstract" + ] + }, + { + "cell_type": "markdown", + "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf", + "metadata": { + "id": "8c7017a2-32aa-4002-a2f3-12aac293ccdf" + }, + "source": [ + "## E.2 Preparing the dataset" + ] + }, + { + "cell_type": "markdown", + "id": "669c64df-4431-4d27-834d-2bb38a01fc02", + "metadata": {}, + "source": [ + "- This section repeats the code from chapter 6 to load and prepare the dataset\n", + "- Instead of repeating this code, one could copy & paste the LoRA code from section E.3 at the end of the chapter 6 notebook\n", + "- (The LoRA code was originally the last section of chapter 6 but was moved to the appendix due to the length of chapter 6)\n", + "- In similar fashion, we could also apply LoRA to the models in chapter 7 for instruction finetuning" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "def7c09b-af9c-4216-90ce-5e67aed1065c", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "def7c09b-af9c-4216-90ce-5e67aed1065c", + "outputId": "424e4423-f623-443c-ab9e-656f9e867559" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "sms_spam_collection/SMSSpamCollection.tsv already exists. Skipping download and extraction.\n" + ] + } + ], + "source": [ + "from pathlib import Path\n", + "import pandas as pd\n", + "from previous_chapters import (\n", + " download_and_unzip,\n", + " create_balanced_dataset,\n", + " random_split\n", + ")\n", + "\n", + "\n", + "url = \"https://archive.ics.uci.edu/static/public/228/sms+spam+collection.zip\"\n", + "zip_path = \"sms_spam_collection.zip\"\n", + "extracted_path = \"sms_spam_collection\"\n", + "data_file_path = Path(extracted_path) / \"SMSSpamCollection.tsv\"\n", + "\n", + "download_and_unzip(url, zip_path, extracted_path, data_file_path)\n", + "\n", + "df = pd.read_csv(data_file_path, sep=\"\\t\", header=None, names=[\"Label\", \"Text\"])\n", + "balanced_df = create_balanced_dataset(df)\n", + "balanced_df[\"Label\"] = balanced_df[\"Label\"].map({\"ham\": 0, \"spam\": 1})\n", + "\n", + "train_df, validation_df, test_df = random_split(balanced_df, 0.7, 0.1)\n", + "train_df.to_csv(\"train.csv\", index=None)\n", + "validation_df.to_csv(\"validation.csv\", index=None)\n", + "test_df.to_csv(\"test.csv\", index=None)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "74c3c463-8763-4cc0-9320-41c7eaad8ab7", + "outputId": "b5b48439-32c8-4b37-cca2-c9dc8fa86563" + }, + "outputs": [], + "source": [ + "import torch\n", + "from torch.utils.data import Dataset\n", + "import tiktoken\n", + "from previous_chapters import SpamDataset\n", + "\n", + "\n", + "tokenizer = tiktoken.get_encoding(\"gpt2\")\n", + "train_dataset = SpamDataset(\"train.csv\", max_length=None, tokenizer=tokenizer)\n", + "val_dataset = SpamDataset(\"validation.csv\", max_length=train_dataset.max_length, tokenizer=tokenizer)\n", + "test_dataset = SpamDataset(\"test.csv\", max_length=train_dataset.max_length, tokenizer=tokenizer)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "8681adc0-6f02-4e75-b01a-a6ab75d05542", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8681adc0-6f02-4e75-b01a-a6ab75d05542", + "outputId": "3266c410-4fdb-4a8c-a142-7f707e2525ab" + }, + "outputs": [], + "source": [ + "from torch.utils.data import DataLoader\n", + "\n", + "num_workers = 0\n", + "batch_size = 8\n", + "\n", + "torch.manual_seed(123)\n", + "\n", + "train_loader = DataLoader(\n", + " dataset=train_dataset,\n", + " batch_size=batch_size,\n", + " shuffle=True,\n", + " num_workers=num_workers,\n", + " drop_last=True,\n", + ")\n", + "\n", + "val_loader = DataLoader(\n", + " dataset=val_dataset,\n", + " batch_size=batch_size,\n", + " num_workers=num_workers,\n", + " drop_last=False,\n", + ")\n", + "\n", + "test_loader = DataLoader(\n", + " dataset=test_dataset,\n", + " batch_size=batch_size,\n", + " num_workers=num_workers,\n", + " drop_last=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ab7335db-e0bb-4e27-80c5-eea11e593a57", + "metadata": {}, + "source": [ + "- As a verification step, we iterate through the data loaders and check that the batches contain 8 training examples each, where each training example consists of 120 tokens" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4dee6882-4c3a-4964-af15-fa31f86ad047", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Train loader:\n", + "Input batch dimensions: torch.Size([8, 120])\n", + "Label batch dimensions torch.Size([8])\n" + ] + } + ], + "source": [ + "print(\"Train loader:\")\n", + "for input_batch, target_batch in train_loader:\n", + " pass\n", + "\n", + "print(\"Input batch dimensions:\", input_batch.shape)\n", + "print(\"Label batch dimensions\", target_batch.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "5cdd7947-7039-49bf-8a5e-c0a2f4281ca1", + "metadata": {}, + "source": [ + "- Lastly, let's print the total number of batches in each dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "IZfw-TYD2zTj", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IZfw-TYD2zTj", + "outputId": "6934bbf2-9797-4fbe-d26b-1a246e18c2fb" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "130 training batches\n", + "19 validation batches\n", + "38 test batches\n" + ] + } + ], + "source": [ + "print(f\"{len(train_loader)} training batches\")\n", + "print(f\"{len(val_loader)} validation batches\")\n", + "print(f\"{len(test_loader)} test batches\")" + ] + }, + { + "cell_type": "markdown", + "id": "dec9aa4a-ffd2-4d9f-a835-cce1059fe604", + "metadata": {}, + "source": [ + "## E.3 Initializing the model" + ] + }, + { + "cell_type": "markdown", + "id": "f36ebdaf-810e-46a2-9ad9-e017a04051b1", + "metadata": {}, + "source": [ + "- This section repeats the code from chapter 6 to load and prepare the model" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "02b3a506-3879-4258-82b5-93a5b6bafa74", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File already exists and is up-to-date: gpt2/124M/checkpoint\n", + "File already exists and is up-to-date: gpt2/124M/encoder.json\n", + "File already exists and is up-to-date: gpt2/124M/hparams.json\n", + "File already exists and is up-to-date: gpt2/124M/model.ckpt.data-00000-of-00001\n", + "File already exists and is up-to-date: gpt2/124M/model.ckpt.index\n", + "File already exists and is up-to-date: gpt2/124M/model.ckpt.meta\n", + "File already exists and is up-to-date: gpt2/124M/vocab.bpe\n" + ] + } + ], + "source": [ + "from gpt_download import download_and_load_gpt2\n", + "from previous_chapters import GPTModel, load_weights_into_gpt\n", + "\n", + "\n", + "CHOOSE_MODEL = \"gpt2-small (124M)\"\n", + "INPUT_PROMPT = \"Every effort moves\"\n", + "\n", + "BASE_CONFIG = {\n", + " \"vocab_size\": 50257, # Vocabulary size\n", + " \"context_length\": 1024, # Context length\n", + " \"drop_rate\": 0.0, # Dropout rate\n", + " \"qkv_bias\": True # Query-key-value bias\n", + "}\n", + "\n", + "model_configs = {\n", + " \"gpt2-small (124M)\": {\"emb_dim\": 768, \"n_layers\": 12, \"n_heads\": 12},\n", + " \"gpt2-medium (355M)\": {\"emb_dim\": 1024, \"n_layers\": 24, \"n_heads\": 16},\n", + " \"gpt2-large (774M)\": {\"emb_dim\": 1280, \"n_layers\": 36, \"n_heads\": 20},\n", + " \"gpt2-xl (1558M)\": {\"emb_dim\": 1600, \"n_layers\": 48, \"n_heads\": 25},\n", + "}\n", + "\n", + "BASE_CONFIG.update(model_configs[CHOOSE_MODEL])\n", + "\n", + "model_size = CHOOSE_MODEL.split(\" \")[-1].lstrip(\"(\").rstrip(\")\")\n", + "settings, params = download_and_load_gpt2(model_size=model_size, models_dir=\"gpt2\")\n", + "\n", + "model = GPTModel(BASE_CONFIG)\n", + "load_weights_into_gpt(model, params)\n", + "model.eval();" + ] + }, + { + "cell_type": "markdown", + "id": "252614cd-7ce6-4908-83e6-3761f519904e", + "metadata": {}, + "source": [ + "- To ensure that the model was loaded corrected, let's double-check that it generates coherent text" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "8b6ce20c-0700-4783-8be0-4cf17c200a7f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Every effort moves you forward.\n", + "\n", + "The first step is to understand the importance of your work\n" + ] + } + ], + "source": [ + "from previous_chapters import (\n", + " generate_text_simple,\n", + " text_to_token_ids,\n", + " token_ids_to_text\n", + ")\n", + "\n", + "\n", + "text_1 = \"Every effort moves you\"\n", + "\n", + "token_ids = generate_text_simple(\n", + " model=model,\n", + " idx=text_to_token_ids(text_1, tokenizer),\n", + " max_new_tokens=15,\n", + " context_size=BASE_CONFIG[\"context_length\"]\n", + ")\n", + "\n", + "print(token_ids_to_text(token_ids, tokenizer))" + ] + }, + { + "cell_type": "markdown", + "id": "8174b31b-1ab5-4115-b01c-245369da5af3", + "metadata": {}, + "source": [ + "- Then, we prepare the model for classification finetuning similar to chapter 6, where we replace the output layer" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "e255ce91-d73a-4854-90a4-95804928eb16", + "metadata": {}, + "outputs": [], + "source": [ + "torch.manual_seed(123)\n", + "\n", + "num_classes = 2\n", + "model.out_head = torch.nn.Linear(in_features=768, out_features=num_classes)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "02e6f057-1383-4ece-8444-0a88e71ac75d", + "metadata": {}, + "outputs": [], + "source": [ + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "model.to(device); # no assignment model = model.to(device) necessary for nn.Module classes" + ] + }, + { + "cell_type": "markdown", + "id": "8e951cd6-5e42-44d2-b21f-895cb61004fe", + "metadata": {}, + "source": [ + "- Lastly, let's calcuate the initial classification accuracy of the non-finetuning model (we expect this to be around 50%, which means that the model is not able to reliably distinguish between spam and non-spam messages, yet)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "fc7dd72c-73a2-4881-ade0-0a9605f1ab8c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 46.25%\n", + "Validation accuracy: 45.00%\n", + "Test accuracy: 48.75%\n" + ] + } + ], + "source": [ + "from previous_chapters import calc_accuracy_loader\n", + "\n", + "\n", + "torch.manual_seed(123)\n", + "train_accuracy = calc_accuracy_loader(train_loader, model, device, num_batches=10)\n", + "val_accuracy = calc_accuracy_loader(val_loader, model, device, num_batches=10)\n", + "test_accuracy = calc_accuracy_loader(test_loader, model, device, num_batches=10)\n", + "\n", + "print(f\"Training accuracy: {train_accuracy*100:.2f}%\")\n", + "print(f\"Validation accuracy: {val_accuracy*100:.2f}%\")\n", + "print(f\"Test accuracy: {test_accuracy*100:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "398a1ec9-e2a1-43d6-bf9f-12ee54b46a7b", + "metadata": { + "id": "398a1ec9-e2a1-43d6-bf9f-12ee54b46a7b" + }, + "source": [ + "## E.4 Parameter-efficient finetuning with LoRA" + ] + }, + { + "cell_type": "markdown", + "id": "652a4a82-61ef-4d0a-9858-8988e844f12c", + "metadata": {}, + "source": [ + "- We begin by initializing a LoRALayer that creates the matrices $A$ and $B$, along with the `alpha` scaling hyperparameter and the `rank` ($r$) hyperparameters\n", + "- This layer can accept an input and compute the corresponding output, as illustrated in the figure below\n", + "\n", + "\n", + "\n", + "In code, this LoRA layer depicted in the figure above looks like as follows" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "2ds9ywjMwvIW", + "metadata": { + "id": "2ds9ywjMwvIW" + }, + "outputs": [], + "source": [ + "class LoRALayer(torch.nn.Module):\n", + " def __init__(self, in_dim, out_dim, rank, alpha):\n", + " super().__init__()\n", + " std_dev = 1 / torch.sqrt(torch.tensor(rank).float())\n", + " self.A = torch.nn.Parameter(torch.randn(in_dim, rank) * std_dev)\n", + " self.B = torch.nn.Parameter(torch.zeros(rank, out_dim))\n", + " self.alpha = alpha\n", + "\n", + " def forward(self, x):\n", + " x = self.alpha * (x @ self.A @ self.B)\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "id": "ad21faa8-0614-4257-93cd-68952193e14a", + "metadata": {}, + "source": [ + "- In the code above, `rank` is a hyperparameter that controls the inner dimension of the matrices $A$ and $B$\n", + "- In other words, this parameter controls the number of additional parameters introduced by LoRA and is a key factor in determining the balance between model adaptability and parameter efficiency\n", + "- The second hyperparameter, alpha, is a scaling hyperparameter applied to the output of the low-rank adaptation\n", + "- It essentially controls the extent to which the adapted layer's output is allowed to influence the original output of the layer being adapted\n", + "- This can be seen as a way to regulate the impact of the low-rank adaptation on the layer's output\n", + "- So far, the `LoRALayer` class we implemented above allows us to transform the layer inputs $x$\n", + "- However, in LoRA, we are usually interested in replacing existing `Linear` layers so that the weight update is applied to the existing pretrained weights, as shown in the figure below\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "3e6d5da0-dfce-4808-b89b-29ff333f563f", + "metadata": {}, + "source": [ + "- To incorporate the original `Linear` layer weights as shown in the figure above, we implement a `LinearWithLoRA` layer below that uses the previously implemented LoRALayer and can be used to replace existing `Linear` layers in a neural network, for example, the self-attention module or feed forward modules in an LLM" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "127d3a64-8359-4b21-b056-78d58cc75fe8", + "metadata": {}, + "outputs": [], + "source": [ + "class LinearWithLoRA(torch.nn.Module):\n", + " def __init__(self, linear, rank, alpha):\n", + " super().__init__()\n", + " self.linear = linear\n", + " self.lora = LoRALayer(\n", + " linear.in_features, linear.out_features, rank, alpha\n", + " )\n", + "\n", + " def forward(self, x):\n", + " return self.linear(x) + self.lora(x)" + ] + }, + { + "cell_type": "markdown", + "id": "e1145a90-35ff-462c-820b-15483fa5b051", + "metadata": {}, + "source": [ + "- Note that since we initialize the weight matrix $B$ (`self.B` in `LoraLayer`) with zero values in the LoRA layer, the matrix multiplication between $A$ and $B$ results in a matrix consisting of 0's and doesn't affect the original weights (since adding 0 to the original weights does not modify them)" + ] + }, + { + "cell_type": "markdown", + "id": "e98a6d36-7bc9-434c-a7f1-533f26aff06d", + "metadata": { + "id": "4D21Jk7Vw3nG" + }, + "source": [ + "- To try LoRA on the GPT model we defined earlier, we define a `replace_linear_with_lora` function to replace all `Linear` layers in the model with the new `LinearWithLoRA` layers" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "WlQZ8ygqzN_g", + "metadata": { + "id": "WlQZ8ygqzN_g" + }, + "outputs": [], + "source": [ + "def replace_linear_with_lora(model, rank, alpha):\n", + " for name, module in model.named_children():\n", + " if isinstance(module, torch.nn.Linear):\n", + " # Replace the Linear layer with LinearWithLoRA\n", + " setattr(model, name, LinearWithLoRA(module, rank, alpha))\n", + " else:\n", + " # Recursively apply the same function to child modules\n", + " replace_linear_with_lora(module, rank, alpha)" + ] + }, + { + "cell_type": "markdown", + "id": "8c172164-cdde-4489-b7d7-aaed9cc2f5f2", + "metadata": {}, + "source": [ + "- We then freeze the original model parameter and use the `replace_linear_with_lora` to replace the said `Linear` layers below" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "dbe15350-4da9-4829-9d23-98bbd3d0b1a1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total trainable parameters before: 124,441,346\n", + "Total trainable parameters after: 0\n" + ] + } + ], + "source": [ + "total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n", + "print(f\"Total trainable parameters before: {total_params:,}\")\n", + "\n", + "for param in model.parameters():\n", + " param.requires_grad = False\n", + "\n", + "total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n", + "print(f\"Total trainable parameters after: {total_params:,}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "mLk_fPq0yz_u", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mLk_fPq0yz_u", + "outputId": "7ba89607-ca75-4718-e8dc-9cdc44c3e410" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total trainable LoRA parameters: 1,333,264\n" + ] + } + ], + "source": [ + "replace_linear_with_lora(model, rank=8, alpha=8)\n", + "\n", + "total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n", + "print(f\"Total trainable LoRA parameters: {total_params:,}\")" + ] + }, + { + "cell_type": "markdown", + "id": "b8b6819e-ef7a-4f0d-841a-1b467496bef9", + "metadata": {}, + "source": [ + "- As we can see, we reduced the number of trainable parameters by almost 100x when using LoRA\n", + "- Let's now double-check whether the layers have been modified as intended by printing the model architecture" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1711be61-bb2c-466f-9b5b-24f4aa5ccd9c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GPTModel(\n", + " (tok_emb): Embedding(50257, 768)\n", + " (pos_emb): Embedding(1024, 768)\n", + " (drop_emb): Dropout(p=0.0, inplace=False)\n", + " (trf_blocks): Sequential(\n", + " (0): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (1): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (2): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (3): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (4): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (5): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (6): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (7): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (8): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (9): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (10): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (11): TransformerBlock(\n", + " (att): MultiHeadAttention(\n", + " (W_query): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_key): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (W_value): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (out_proj): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (dropout): Dropout(p=0.0, inplace=False)\n", + " )\n", + " (ff): FeedForward(\n", + " (layers): Sequential(\n", + " (0): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=3072, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " (1): GELU()\n", + " (2): LinearWithLoRA(\n", + " (linear): Linear(in_features=3072, out_features=768, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + " )\n", + " )\n", + " (norm1): LayerNorm()\n", + " (norm2): LayerNorm()\n", + " (drop_resid): Dropout(p=0.0, inplace=False)\n", + " )\n", + " )\n", + " (final_norm): LayerNorm()\n", + " (out_head): LinearWithLoRA(\n", + " (linear): Linear(in_features=768, out_features=2, bias=True)\n", + " (lora): LoRALayer()\n", + " )\n", + ")\n" + ] + } + ], + "source": [ + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "model.to(device)\n", + "\n", + "print(model)" + ] + }, + { + "cell_type": "markdown", + "id": "c4bbc9d7-65ec-4675-bab8-2e56eb0cfb55", + "metadata": {}, + "source": [ + "- Based on the model architecture above, we can see that the model now contains our new `LinearWithLoRA` layers\n", + "- Also, since we initialized matrix $B$ with 0's, we expect the initial model performance to be unchanged compared to before" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "DAlrb_I00VEU", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DAlrb_I00VEU", + "outputId": "3dae5ff0-316d-408e-c8dc-2b8c60f9b994" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 46.25%\n", + "Validation accuracy: 45.00%\n", + "Test accuracy: 48.75%\n" + ] + } + ], + "source": [ + "torch.manual_seed(123)\n", + "train_accuracy = calc_accuracy_loader(train_loader, model, device, num_batches=10)\n", + "val_accuracy = calc_accuracy_loader(val_loader, model, device, num_batches=10)\n", + "test_accuracy = calc_accuracy_loader(test_loader, model, device, num_batches=10)\n", + "\n", + "print(f\"Training accuracy: {train_accuracy*100:.2f}%\")\n", + "print(f\"Validation accuracy: {val_accuracy*100:.2f}%\")\n", + "print(f\"Test accuracy: {test_accuracy*100:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "13735b3e-f0c3-4dba-ae3d-4141b2878101", + "metadata": {}, + "source": [ + "- Let's now get to the interesting part and finetune the model reusing the training function from chapter 6\n", + "- The training takes about 15 minutes on a M3 MacBook Air laptop computer and less than half a minute on a V100 or A100 GPU" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "wCParRvr0eff", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wCParRvr0eff", + "outputId": "b86fd5f4-1527-4549-e0b0-9dff37836f0a" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Ep 1 (Step 000000): Train loss 2.849, Val loss 2.565\n", + "Ep 1 (Step 000050): Train loss 0.515, Val loss 0.465\n", + "Ep 1 (Step 000100): Train loss 0.191, Val loss 0.423\n", + "Training accuracy: 97.50% | Validation accuracy: 97.50%\n", + "Ep 2 (Step 000150): Train loss 0.170, Val loss 0.072\n", + "Ep 2 (Step 000200): Train loss 0.014, Val loss 0.087\n", + "Ep 2 (Step 000250): Train loss 0.027, Val loss 0.197\n", + "Training accuracy: 100.00% | Validation accuracy: 92.50%\n", + "Ep 3 (Step 000300): Train loss 0.014, Val loss 0.321\n", + "Ep 3 (Step 000350): Train loss 0.015, Val loss 0.146\n", + "Training accuracy: 100.00% | Validation accuracy: 97.50%\n", + "Ep 4 (Step 000400): Train loss 0.008, Val loss 0.103\n", + "Ep 4 (Step 000450): Train loss 0.010, Val loss 0.178\n", + "Ep 4 (Step 000500): Train loss 0.097, Val loss 0.056\n", + "Training accuracy: 100.00% | Validation accuracy: 97.50%\n", + "Ep 5 (Step 000550): Train loss 0.032, Val loss 0.091\n", + "Ep 5 (Step 000600): Train loss 0.002, Val loss 0.058\n", + "Training accuracy: 100.00% | Validation accuracy: 100.00%\n", + "Ep 6 (Step 000650): Train loss 0.001, Val loss 0.009\n", + "Ep 6 (Step 000700): Train loss 0.001, Val loss 0.039\n", + "Ep 6 (Step 000750): Train loss 0.000, Val loss 0.038\n", + "Training accuracy: 100.00% | Validation accuracy: 95.00%\n", + "Training completed in 13.70 minutes.\n" + ] + } + ], + "source": [ + "import time\n", + "from previous_chapters import train_classifier_simple\n", + "\n", + "\n", + "start_time = time.time()\n", + "\n", + "torch.manual_seed(123)\n", + "\n", + "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5, weight_decay=0.1)\n", + "\n", + "num_epochs = 6\n", + "train_losses, val_losses, train_accs, val_accs, examples_seen = train_classifier_simple(\n", + " model, train_loader, val_loader, optimizer, device,\n", + " num_epochs=num_epochs, eval_freq=50, eval_iter=5,\n", + " tokenizer=tokenizer\n", + ")\n", + "\n", + "end_time = time.time()\n", + "execution_time_minutes = (end_time - start_time) / 60\n", + "print(f\"Training completed in {execution_time_minutes:.2f} minutes.\")" + ] + }, + { + "cell_type": "markdown", + "id": "d0c89e82-3aa8-44c6-b046-0b16200b8e6c", + "metadata": {}, + "source": [ + "- Finally, let's evaluate the model" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "bawWGijA0iF3", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 307 + }, + "id": "bawWGijA0iF3", + "outputId": "4b05b245-ffac-4d36-881b-8306a4da6b75" + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAdwAAAEiCAYAAABTO2OcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAABPd0lEQVR4nO3dd3hUVfrA8e9Mkpn03ikBJECAEEJdRBAlUlRWsMCyqEFRFw0iIoqsCog/DXYsLAoq6FpiAxcVqVIU6b2E0AKhpICQSurM+f1xk0mGACaQzKS8n+e5T+aWufc9Icw759xzz9EppRRCCCGEqFV6ewcghBBCNAaScIUQQggbkIQrhBBC2IAkXCGEEMIGJOEKIYQQNiAJVwghhLABSbhCCCGEDUjCFUIIIWxAEq4QQghhA5JwhWiE+vXrx4QJE+wdhhCNiiRcIa7C6NGj0el0lZZBgwbZOzQhRB3laO8AhKivBg0axPz58622GY1GO0UjhKjrpIYrxFUyGo0EBwdbLT4+PgCsWbMGg8HAb7/9Zjn+tddeIzAwkPT0dACWLl3KDTfcgLe3N35+ftx+++0cOXLEcvyxY8fQ6XR888039OnTBxcXF7p3787BgwfZsmUL3bp1w93dncGDB3PmzBnL+0aPHs3QoUN58cUXCQgIwNPTk7Fjx1JUVHTZshQWFjJp0iSaNGmCm5sbPXv2ZM2aNZb9x48fZ8iQIfj4+ODm5kaHDh1YsmTJZc/3n//8h/DwcJydnQkKCuLuu++27DObzcTHx9OyZUtcXFyIioriu+++s3r/3r17GTx4MO7u7gQFBXHfffdx9uxZy/5+/foxfvx4nnnmGXx9fQkODmb69OmXjUeIukASrhC1oOwe6X333UdWVhY7duzghRde4KOPPiIoKAiAvLw8Jk6cyNatW1m1ahV6vZ5hw4ZhNputzjVt2jSef/55tm/fjqOjI//85z955plneOedd/jtt984fPgwU6dOtXrPqlWrSExMZM2aNXz11VcsXLiQF1988bLxjhs3jg0bNpCQkMDu3bu55557GDRoEIcOHQIgLi6OwsJC1q1bx549e3j11Vdxd3e/5Lm2bt3K+PHjmTFjBklJSSxdupS+ffta9sfHx/PZZ5/xwQcfsG/fPp588knuvfde1q5dC0BmZiY333wz0dHRbN26laVLl5Kens7w4cOtrvPpp5/i5ubGpk2beO2115gxYwYrVqyo4r+QEHaghBDVFhsbqxwcHJSbm5vV8vLLL1uOKSwsVJ07d1bDhw9X7du3Vw8//PAVz3nmzBkFqD179iillEpOTlaA+uijjyzHfPXVVwpQq1atsmyLj49Xbdu2tYrN19dX5eXlWbbNmTNHubu7K5PJpJRS6sYbb1RPPPGEUkqp48ePKwcHB3Xq1CmrePr376+mTJmilFIqMjJSTZ8+vUq/m++//155enqq7OzsSvsKCgqUq6ur+uOPP6y2jxkzRo0cOVIppdRLL72kBgwYYLX/xIkTClBJSUmW+G+44QarY7p3764mT55cpRiFsAe5hyvEVbrpppuYM2eO1TZfX1/La4PBwBdffEGnTp0ICwvj7bfftjr20KFDTJ06lU2bNnH27FlLzTYlJYWOHTtajuvUqZPldVntODIy0mpbRkaG1bmjoqJwdXW1rPfq1Yvc3FxOnDhBWFiY1bF79uzBZDLRpk0bq+2FhYX4+fkBMH78eB599FGWL19OTEwMd911l1VcFd1yyy2EhYXRqlUrBg0axKBBgxg2bBiurq4cPnyYCxcucMstt1i9p6ioiOjoaAB27drF6tWrL1mDPnLkiCXOi68fEhJS6fcgRF0iCVeIq+Tm5kbr1q2veMwff/wBwLlz5zh37hxubm6WfUOGDCEsLIx58+YRGhqK2WymY8eOle61Ojk5WV7rdLpLbru4Gbo6cnNzcXBwYNu2bTg4OFjtK0t6Dz30EAMHDuTnn39m+fLlxMfH8+abb/L4449XOp+Hhwfbt29nzZo1LF++nKlTpzJ9+nS2bNlCbm4uAD///DNNmjSxel9Zh7Pc3FyGDBnCq6++WuncISEhltcVfwdw7b8HIWqbJFwhasmRI0d48sknmTdvHl9//TWxsbGsXLkSvV7Pn3/+SVJSEvPmzaNPnz4A/P777zV27V27dpGfn4+LiwsAGzduxN3dnWbNmlU6Njo6GpPJREZGhiWWS2nWrBljx45l7NixTJkyhXnz5l0y4QI4OjoSExNDTEwM06ZNw9vbm19//ZVbbrkFo9FISkoKN9544yXf26VLF77//ntatGiBo6N8RImGQ/6ahbhKhYWFpKWlWW1zdHTE398fk8nEvffey8CBA3nggQcYNGgQkZGRvPnmmzz99NP4+Pjg5+fH3LlzCQkJISUlhWeffbbGYisqKmLMmDE8//zzHDt2jGnTpjFu3Dj0+sr9JNu0acOoUaO4//77efPNN4mOjubMmTOsWrWKTp06cdtttzFhwgQGDx5MmzZtOH/+PKtXryYiIuKS1/7pp584evQoffv2xcfHhyVLlmA2m2nbti0eHh5MmjSJJ598ErPZzA033EBWVhbr16/H09OT2NhY4uLimDdvHiNHjrT0Qj58+DAJCQl89NFHlWrhQtQXknCFuEpLly61auIEaNu2LQcOHODll1/m+PHj/PTTT4DWFDp37lxGjhzJgAEDiIqKIiEhgfHjx9OxY0fatm3Lu+++S79+/Woktv79+xMeHk7fvn0pLCxk5MiRV3xsZv78+fzf//0fTz31FKdOncLf35+//e1v3H777QCYTCbi4uI4efIknp6eDBo0qNI96TLe3t4sXLiQ6dOnU1BQQHh4OF999RUdOnQA4KWXXiIgIID4+HiOHj2Kt7c3Xbp04d///jcAoaGhrF+/nsmTJzNgwAAKCwsJCwtj0KBBl/zCIER9oVNKKXsHIYSoOaNHjyYzM5MffvjB3qEIISqQr4tCCCGEDUjCFUIIIWxAmpSFEEIIG5AarhBCCGEDknCFEEIIG5CEK4QQQtiAJNxSs2fPpkWLFjg7O9OzZ082b95s75CqZN26dQwZMoTQ0FB0Ol2lR0GUUkydOpWQkBBcXFyIiYmxzABT5ty5c4waNQpPT0+8vb0ZM2aMZQi+Mrt376ZPnz44OzvTrFkzXnvttdou2mXFx8fTvXt3PDw8CAwMZOjQoSQlJVkdU1BQQFxcHH5+fri7u3PXXXdZpsUrk5KSwm233YarqyuBgYE8/fTTlJSUWB2zZs0aunTpgtFopHXr1ixYsKC2i3dJc+bMoVOnTnh6euLp6UmvXr345ZdfLPsbWnkvNnPmTHQ6HRMmTLBsa4hlnj59Ojqdzmpp166dZX9DLDPAqVOnuPfee/Hz88PFxYXIyEi2bt1q2d9gPsfsOXNCXZGQkKAMBoP65JNP1L59+9TDDz+svL29VXp6ur1D+0tLlixRzz33nFq4cKEC1KJFi6z2z5w5U3l5eakffvhB7dq1S/39739XLVu2VPn5+ZZjBg0apKKiotTGjRvVb7/9plq3bm2ZuUUppbKyslRQUJAaNWqU2rt3r/rqq6+Ui4uL+vDDD21VTCsDBw5U8+fPV3v37lU7d+5Ut956q2revLnKzc21HDN27FjVrFkztWrVKrV161b1t7/9TV1//fWW/SUlJapjx44qJiZG7dixQy1ZskT5+/tbZsdRSqmjR48qV1dXNXHiRLV//3713nvvKQcHB7V06VKbllcppRYvXqx+/vlndfDgQZWUlKT+/e9/KycnJ7V3794GWd6KNm/erFq0aKE6depkmeFIqYZZ5mnTpqkOHTqo1NRUy3LmzBnL/oZY5nPnzqmwsDA1evRotWnTJnX06FG1bNkydfjwYcsxDeVzTBKuUqpHjx4qLi7Osm4ymVRoaKiKj4+3Y1TVd3HCNZvNKjg4WL3++uuWbZmZmcpoNKqvvvpKKaXU/v37FaC2bNliOeaXX35ROp3OMl3bf/7zH+Xj46MKCwstx0yePNlqSjh7ysjIUIBau3atUkoro5OTk/r2228txyQmJipAbdiwQSmlfVHR6/UqLS3NcsycOXOUp6enpZzPPPOM6tChg9W1RowYoQYOHFjbRaoSHx8f9dFHHzXo8ubk5Kjw8HC1YsUKqykFG2qZp02bpqKioi65r6GWefLkyZWmWqyoIX2ONfom5aKiIrZt20ZMTIxlm16vJyYmhg0bNtgxsmuXnJxMWlqaVdm8vLzo2bOnpWwbNmzA29ubbt26WY6JiYlBr9ezadMmyzF9+/bFYDBYjhk4cCBJSUmcP3/eRqW5vKysLKB8arxt27ZRXFxsVe527drRvHlzq3JHRkZaprsDrUzZ2dns27fPckzFc5QdY++/C5PJREJCAnl5efTq1atBlzcuLo7bbrutUlwNucyHDh0iNDSUVq1aMWrUKFJSUoCGW+bFixfTrVs37rnnHgIDA4mOjmbevHmW/Q3pc6zRJ9yzZ89iMpms/kBBm2P04oHp65uy+K9UtrS0NAIDA632Ozo64uvra3XMpc5R8Rr2YjabmTBhAr1797bMIZuWlobBYMDb29vq2IvL/Vdlutwx2dnZ5Ofn10ZxrmjPnj24u7tjNBoZO3YsixYton379g22vAkJCWzfvp34+PhK+xpqmXv27MmCBQtYunQpc+bMITk5mT59+pCTk9Ngy3z06FHmzJlDeHg4y5Yt49FHH2X8+PF8+umnVnE3hM8xmbxA1GtxcXHs3bu3Rqe2q6vatm3Lzp07ycrK4rvvviM2Npa1a9faO6xaceLECZ544glWrFiBs7OzvcOxmcGDB1ted+rUiZ49exIWFsY333xjmWqxoTGbzXTr1o1XXnkF0KaL3Lt3Lx988AGxsbF2jq5mNfoarr+/Pw4ODpV6+qWnpxMcHGynqGpGWfxXKltwcDAZGRlW+0tKSjh37pzVMZc6R8Vr2MO4ceP46aefWL16NU2bNrVsDw4OpqioiMzMTKvjLy73X5Xpcsd4enra5cPPYDDQunVrunbtSnx8PFFRUbzzzjsNsrzbtm0jIyODLl264OjoiKOjI2vXruXdd9/F0dGRoKCgBlfmS/H29qZNmzYcPny4Qf47gzaTVvv27a22RUREWJrSG9LnWKNPuAaDga5du7Jq1SrLNrPZzKpVq+jVq5cdI7t2LVu2JDg42Kps2dnZbNq0yVK2Xr16kZmZybZt2yzH/Prrr5jNZnr27Gk5Zt26dRQXF1uOWbFiBW3btsXHx8dGpSmnlGLcuHEsWrSIX3/9lZYtW1rt79q1K05OTlblTkpKIiUlxarce/bssfpPumLFCjw9PS3/+Xv16mV1jrJj6srfhdlsprCwsEGWt3///uzZs4edO3dalm7dujFq1CjL64ZW5kvJzc3lyJEjhISENMh/Z4DevXtXeqzv4MGDhIWFAQ3sc8xm3bPqsISEBGU0GtWCBQvU/v371SOPPKK8vb2tevrVVTk5OWrHjh1qx44dClBvvfWW2rFjhzp+/LhSSutO7+3trf73v/+p3bt3qzvuuOOS3emjo6PVpk2b1O+//67Cw8OtutNnZmaqoKAgdd9996m9e/eqhIQE5erqarfHgh599FHl5eWl1qxZY/X4xIULFyzHjB07VjVv3lz9+uuvauvWrapXr16qV69elv1lj08MGDBA7dy5Uy1dulQFBARc8vGJp59+WiUmJqrZs2fb7fGJZ599Vq1du1YlJyer3bt3q2effVbpdDq1fPnyBlneS6nYS1mphlnmp556Sq1Zs0YlJyer9evXq5iYGOXv768yMjKUUg2zzJs3b1aOjo7q5ZdfVocOHVJffPGFcnV1VZ9//rnlmIbyOSYJt9R7772nmjdvrgwGg+rRo4fauHGjvUOqktWrVyug0hIbG6uU0rrUv/DCCyooKEgZjUbVv39/lZSUZHWOP//8U40cOVK5u7srT09P9cADD6icnByrY3bt2qVuuOEGZTQaVZMmTdTMmTNtVcRKLlVeQM2fP99yTH5+vnrssceUj4+PcnV1VcOGDVOpqalW5zl27JgaPHiwcnFxUf7+/uqpp55SxcXFVsesXr1ade7cWRkMBtWqVSura9jSgw8+qMLCwpTBYFABAQGqf//+lmSrVMMr76VcnHAbYplHjBihQkJClMFgUE2aNFEjRoyweh61IZZZKaV+/PFH1bFjR2U0GlW7du3U3LlzrfY3lM8xmS1ICCGEsIFGfw9XCCGEsAVJuEIIIYQNSMIVQgghbEASrhBCCGEDknCFEEIIG5CEK4QQQtiAJNxShYWFTJ8+ncLCQnuHYjNS5sZBytw4SJnrPnkOt1R2djZeXl5kZWXh6elp73BsQsosZW6opMxS5rpIarhCCCGEDUjCFUIIIWygXs+HW1JSwo4dOwgKCkKvv7bvDjk5OQCcOnWK7OzsmgivzpMyS5kbKimzlNmWzGYz6enpREdH4+h4+bRar+/hbtmyhR49etg7DCGEEILNmzfTvXv3y+6v1zXcoKAgQCtkSEiInaMRQgjRGKWmptKjRw9LTrqcep1wy5qRQ0JCaNq0qZ2jEUII0Zj91a1N6TQlhBBC2IAkXCGEEMIGJOEKIYQQNlCv7+EKIcSVmEwmiouL7R2GqOecnJxwcHC45vNIwgWUUuxPzeZAag63dQrB2enaf7FCCPtRSpGWlkZmZqa9QxENhLe3N8HBweh0uqs+hyTcUvd+tInzF4ppE+RBZFMve4cjhLgGZck2MDAQV1fXa/qQFI2bUooLFy6QkZEBcE2PoErCBXQ6He2CPdlw9E8SU7Ml4QpRj5lMJkuy9fPzs3c4ogFwcXEBICMjg8DAwKtuXpZOU6UiQrSZJhLTGseQaEI0VGX3bF1dXe0ciWhIyv6erqVPgCTcUhEhHgAkpkrCFaIhkGZkUZNq4u9JEm4pSw03NYd6PLy0EEKIOkoSbqnWge446HVk5ReTmlVg73CEEOKatWjRglmzZlX5+DVr1qDT6Wq9d/eCBQvw9vau1WvURZJwSzk7OXBdgBsAB+Q+rhDChnQ63RWX6dOnX9V5t2zZwiOPPFLl46+//npSU1Px8pKOo7VBeilXEBHiycH0XBJTc7i53ZVnfRBCiJqSmppqef31118zdepUkpKSLNvc3d0tr5VSmEymK867WiYgIKBacRgMBoKDg6v1HlF1UsOtoOw+7n7pOCWEsKHg4GDL4uXlhU6ns6wfOHAADw8PfvnlF7p27YrRaOT333/nyJEj3HHHHQQFBeHu7k737t1ZuXKl1XkvblLW6XR89NFHDBs2DFdXV8LDw1m8eLFl/8VNymVNv8uWLSMiIgJ3d3cGDRpk9QWhpKSE8ePH4+3tjZ+fH5MnTyY2NpahQ4dW63cwZ84crrvuOgwGA23btuW///2vZZ9SiunTp9O8eXOMRiOhoaGMHz/esv8///kP4eHhODs7ExQUxN13312ta9uKJNwKyjtOScIVoiFRSnGhqMTmS012wHz22WeZOXMmiYmJdOrUidzcXG699VZWrVrFjh07GDRoEEOGDCElJeWK53nxxRcZPnw4u3fv5tZbb2XUqFGcO3fussdfuHCBN954g//+97+sW7eOlJQUJk2aZNn/6quv8sUXXzB//nzWr19PdnY2P/zwQ7XKtmjRIp544gmeeuop9u7dy7/+9S8eeOABVq9eDcD333/P22+/zYcffsihQ4f44YcfiIyMBGDr1q2MHz+eGTNmkJSUxNKlS+nbt2+1rm8r0qRcQUSw9mjQsbN55BeZcDHIEI9CNAT5xSbaT11m8+vunzEQV0PNfMzOmDGDW265xbLu6+tLVFSUZf2ll15i0aJFLF68mHHjxl32PKNHj2bkyJEAvPLKK7z77rts3ryZQYMGXfL44uJiPvjgA6677joAxo0bx4wZMyz733vvPaZMmcKwYcMAeP/991myZEm1yvbGG28wevRoHnvsMQAmTpzIxo0beeONN7jppptISUkhODiYmJgYnJycaN68OT169AAgJSUFNzc3br/9djw8PAgLCyM6Orpa17cVqeFWEOBhxM/NgFnBwfQce4cjhBAW3bp1s1rPzc1l0qRJRERE4O3tjbu7O4mJiX9Zw+3UqZPltZubG56enpZhCy/F1dXVkmxBG9qw7PisrCzS09MtyQ/AwcGBrl27VqtsiYmJ9O7d22pb7969SUxMBOCee+4hPz+fVq1a8fDDD7No0SJKSkoAuOWWWwgLC6NVq1bcd999fPHFF1y4cKFa17cVqeFWoNPpiAjx5PfDZ0lMzSaqmbe9QxJC1AAXJwf2zxhol+vWFDc3N6v1SZMmsWLFCt544w1at26Ni4sLd999N0VFRVc8j5OTk9W6TqfDbDZX63hbj1XQrFkzkpKSWLlyJStWrOCxxx7j9ddfZ+3atXh4eLB9+3bWrFnD8uXLmTp1KtOnT2fLli117tEjqeFeREacEqLh0el0uBocbb7U5mhX69evZ/To0QwbNozIyEiCg4M5duxYrV3vUry8vAgKCmLLli2WbSaTie3bt1frPBEREaxfv95q2/r162nfvr1l3cXFhSFDhvDuu++yZs0aNmzYwJ49ewBwdHQkJiaG1157jd27d3Ps2DF+/fXXayhZ7ZAa7kXaBZePOCWEEHVVeHg4CxcuZMiQIeh0Ol544YUr1lRry+OPP058fDytW7emXbt2vPfee5w/f75aXzaefvpphg8fTnR0NDExMfz4448sXLjQ0ut6wYIFmEwmevbsiaurK59//jkuLi6EhYXx008/cfToUfr27YuPjw9LlizBbDbTtm3b2iryVZOEe5GKkxgopWQ8ViFEnfTWW2/x4IMPcv311+Pv78/kyZPJzrZ9y9zkyZNJS0vj/vvvx8HBgUceeYSBAwdWa0adoUOH8s477/DGG2/wxBNP0LJlS+bPn0+/fv0AbS7amTNnMnHiREwmE5GRkfz444/4+fnh7e3NwoULmT59OgUFBYSHh/PVV1/RoUOHWirx1dOpejxw8MmTJ2nWrBknTpygadOmNXLOohIzHaYtpdik+H3yTTT1kRlHhKhPCgoKSE5OpmXLljg7O9s7nEbHbDYTERHB8OHDeemll+wdTo250t9VVXOR1HDLpO+Hk1swtOzDdQHuHEjLITE1RxKuEEJcwfHjx1m+fDk33ngjhYWFvP/++yQnJ/PPf/7T3qHVOdJpqsyKqfDjeDi8ivYyAIYQQlSJXq9nwYIFdO/end69e7Nnzx5WrlxJRESEvUOrc6SGWyY0Gg6vgNM7aRdyM+yQhCuEEH+lWbNmlXoYi0uTGm6Z0NKRSU7vsHScOpAmPZWFEELUDEm4ZcoS7plEIvy1iv+xP/O4UFRix6CEEEI0FJJwy3iGgHswKDP+OQcJ8DCilNRyhRBC1AxJuBVVaFZuFywjTgkhhKg5knArqpBwy3oqH5ARp4QQQtQASbgVXaLjlNRwhRBC1ARJuBWFdtZ+nj1Iez/tV3MgLQezud4OxiWEaET69evHhAkTLOstWrRg1qxZV3yPTqer9oTxtXmeK5k+fTqdO3eu1WvUJkm4FbkHgmdTQNGq5DAGBz25hSWcPJ9v78iEEA3YkCFDLjsB/G+//YZOp2P37t3VPu+WLVt45JFHrjU8K5dLeqmpqQwePLhGr9XQSMK9WGkt1zFtF60D3QFtIgMhhKgtY8aMYcWKFZw8ebLSvvnz59OtWzerieOrKiAgAFdX2wxPGxwcjNFotMm16itJuBeT+7hCCBu7/fbbCQgIYMGCBVbbc3Nz+fbbbxkzZgx//vknI0eOpEmTJri6uhIZGclXX311xfNe3KR86NAh+vbti7OzM+3bt2fFihWV3jN58mTatGmDq6srrVq14oUXXqC4uBjQpsl78cUX2bVrFzqdDp1OZ4n54iblPXv2cPPNN+Pi4oKfnx+PPPIIubm5lv2jR49m6NChvPHGG4SEhODn50dcXJzlWlVhNpuZMWMGTZs2xWg00rlzZ5YuXWrZX1RUxLhx4wgJCcHZ2ZmwsDDi4+MBUEoxffp0mjdvjtFoJDQ0lPHjx1f52ldDhna8WNNuEBwJvq2IcJJHg4RoUIryqv8eByM4lH5UmkrAVAg6PTi5XPm8BrcqX8LR0ZH777+fBQsW8Nxzz1mmBf32228xmUyMHDmS3NxcunbtyuTJk/H09OTnn3/mvvvu47rrrqNHjx5/eQ2z2cydd95JUFAQmzZtIisry+p+bxkPDw8WLFhAaGgoe/bs4eGHH8bDw4NnnnmGESNGsHfvXpYuXWqZq9bLy6vSOfLy8hg4cCC9evViy5YtZGRk8NBDDzFu3DirLxWrV68mJCSE1atXc/jwYUaMGEHnzp15+OGHq/R7e+edd3jzzTf58MMPiY6O5pNPPuHvf/87+/btIzw8nHfffZfFixfzzTff0Lx5c06cOMGJEycA+P7773n77bdJSEigQ4cOpKWlsWvXripd92pJwr1Yq34w9ncA2h8+C8hk9EI0GK+EVv899yyADsO01wd+hG9HQ9gN8MDP5cfMioQLf1q/b3pWtS7z4IMP8vrrr7N27VrLPLDz58/nrrvuwsvLCy8vLyZNmmQ5/vHHH2fZsmV88803VUq4K1eu5MCBAyxbtozQUO338Morr1S67/r8889bXrdo0YJJkyaRkJDAM888g4uLC+7u7jg6OhIcHHzZa3355ZcUFBTw2Wef4eamffF4//33GTJkCK+++ipBQUEA+Pj48P777+Pg4EC7du247bbbWLVqVZUT7htvvMHkyZP5xz/+AcCrr77K6tWrmTVrFrNnzyYlJYXw8HBuuOEGdDodYWFhlvempKQQHBxMTEwMTk5ONG/evEq/x2shTcpX0K60STnl3AVyCqrezCGEENXVrl07rr/+ej755BMADh8+zG+//caYMWMAMJlMvPTSS0RGRuLr64u7uzvLli0jJSWlSudPTEykWbNmlmQL0KtXr0rHff311/Tu3Zvg4GDc3d15/vnnq3yNiteKioqyJFuA3r17YzabSUpKsmzr0KGD1UT1ISEhZGRkVOka2dnZnD59mt69e1tt7927N4mJiYDWbL1z507atm3L+PHjWb58ueW4e+65h/z8fFq1asXDDz/MokWLKCmp3aF8pYZ7OSVF+JJNkKeR9OxCDqbn0DXM195RCSGuxb9PV/89DhU6ArUbop1Dd1FdZcKea4ur1JgxY3j88ceZPXs28+fP57rrruPGG28E4PXXX+edd95h1qxZREZG4ubmxoQJEygqKqqRawNs2LCBUaNG8eKLLzJw4EC8vLxISEjgzTffrLFrVOTk5GS1rtPpMJvNNXb+Ll26kJyczC+//MLKlSsZPnw4MTExfPfddzRr1oykpCRWrlzJihUreOyxxywtDBfHVVOkhnspO7+E+Cbw81OWjlP7pVlZiPrP4Fb9xaFCvcTBUdtW8f7t5c57FYYPH45er+fLL7/ks88+48EHH7Tcz12/fj133HEH9957L1FRUbRq1YqDBw9W+dwRERGcOHGC1NRUy7aNGzdaHfPHH38QFhbGc889R7du3QgPD+f48ePWRTUYMJlMf3mtXbt2kZdXfm97/fr16PV62rZtW+WYr8TT05PQ0NBKUwOuX7+e9u3bWx03YsQI5s2bx9dff83333/PuXPnAHBxcWHIkCG8++67rFmzhg0bNrBnT818eboUqeFeimcTMBXBuSNEtPBkTdIZ6TglhKh17u7ujBgxgilTppCdnc3o0aMt+8LDw/nuu+/4448/8PHx4a233iI9Pd0quVxJTEwMbdq0ITY2ltdff53s7Gyee+45q2PCw8NJSUkhISGB7t278/PPP7No0SKrY1q0aEFycjI7d+6kadOmeHh4VHocaNSoUUybNo3Y2FimT5/OmTNnePzxx7nvvvss929rwtNPP820adO47rrr6Ny5M/Pnz2fnzp188cUXALz11luEhIQQHR2NXq/n22+/JTg4GG9vbxYsWIDJZKJnz564urry+eef4+LiYnWft6ZJDfdSmvWA8TvhX7/Jo0FCCJsaM2YM58+fZ+DAgVb3W59//nm6dOnCwIED6devH8HBwQwdOrTK59Xr9SxatIj8/Hx69OjBQw89xMsvv2x1zN///neefPJJxo0bR+fOnfnjjz944YUXrI656667GDRoEDfddBMBAQGXfDTJ1dWVZcuWce7cObp3787dd99N//79ef/996v3y/gL48ePZ+LEiTz11FNERkaydOlSFi9eTHh4OKD1uH7ttdfo1q0b3bt359ixYyxZsgS9Xo+3tzfz5s2jd+/edOrUiZUrV/Ljjz/i5+dXozFWpFNK1dtxC0+ePEmzZs04ceIETZs2rZVrHErP4Za31+FqcGDv9IHo9bpauY4QomYUFBSQnJxMy5YtcXZ2tnc4ooG40t9VVXOR1HD/Qkt/NwyOei4UmUg5d8He4QghhKinJOFezqlt8M39OC59mrZBMgCGEEKIayMJ93KKC2D//yBpKREhknCFEEJcG0m4lxPSCdBB9kk6+2iDXsijQUIIIa6WJNzLMXqAfxsAop2OAXBAZg0SQghxlSThXknpzEEtC7WHy0+ezydbhngUol6oyRGLhKiJvycZ+OJKQqNhdwLOZ3YT6tWT01kFHEjNoUdLGeJRiLrKYDCg1+s5ffo0AQEBGAwGy2hNQlSXUoqioiLOnDmDXq/HYDBc9bkk4V7JRXPjns4qIDE1WxKuEHWYXq+nZcuWpKamcvr0VYydLMQluLq60rx5c/T6q28YloR7JcGR2iDluWl0bVvAKuQ+rhD1gcFgoHnz5pSUlPzluL9C/BUHBwccHR2vuaXErgk3Pj6ehQsXcuDAAVxcXLj++ut59dVXa2xw62tmcIWACMjYR3en44CP9FQWop7Q6XQ4OTnV2swvQlSXXTtNrV27lri4ODZu3MiKFSsoLi5mwIABVjNM2F1ps/J1JYcASErLxmSut6NhCiGEsBO71nCXLl1qtb5gwQICAwPZtm0bffv2tVNUFwntDDs/xydzH85Of6Og2MyxP/O4LsDd3pEJIYSoR+rUY0FZWVkA+PpeulNSYWEh2dnZliUnxwbNu6FdANCd3kHbQC3JyohTQgghqqvOJFyz2cyECRPo3bs3HTt2vOQx8fHxeHl5WZaqzgN5TYI6gN4RLpzlb/4FAByQ+7hCCCGqqc4k3Li4OPbu3UtCQsJlj5kyZQpZWVmWZf/+/bUfmJMz3PIS3PMpzZtoc1NKDVcIIUR11YnHgsaNG8dPP/3EunXrrjiXoNFoxGg0Wtazs22U+Ho9BkB48jngmCRcIYQQ1WbXGq5SinHjxrFo0SJ+/fVXWrZsac9w/lK70lmDTmcVkHmhyM7RCCGEqE/smnDj4uL4/PPP+fLLL/Hw8CAtLY20tDTy8/PtGVZlpmI48iue2/5DEy9nAA6kyX1cIYQQVWfXhDtnzhyysrLo168fISEhluXrr7+2Z1iVKQVfjoAVU+kToD0jLM3KQgghqsOu93CVqicDSDgaoO1g0DvS1skZDhdLwhVCCFEtdaLTVL0w/DMAgvekwobtJMqjQUIIIaqhzjwWVF+0C/EE4GB6DiUmmW9TCCFE1UjCrQ6zmTB1GjeDjsISbYhHIYQQoiok4VaV2QxvtUM/uxv9/LXmZJk5SAghRFVJwq0qvR68wwDo634SkJ7KQgghqk4SbnWUTtUXqTsKSMIVQghRdZJwq6M04TYrSAJkEgMhhBBVJwm3OkoTrvu5/egxk5ZdwPk8GeJRCCHEX5OEWx3+4eDkhq44j97e5wFpVhZCCFE1knCrQ+8AIVEA3OypdZzaLwlXCCFEFUjCra7QzgB0djgGyCQGQgghqkYSbnWV3sdtUXgQkCZlIYQQVSMJt7pKE65XViIOmDiUnkuxDPEohBDiL0jCrS7f68Dggd5UQJQxjSKTmaNnZIhHIYQQVyYJt7r0est93P5epwBpVhZCCPHXJOFejdKE294lE4DENEm4QgghruyqEu6JEyc4efKkZX3z5s1MmDCBuXPn1lhgdVrvCfBsCqe7TASQuXGFEEL8patKuP/85z9ZvXo1AGlpadxyyy1s3ryZ5557jhkzZtRogHWSmz84exFROjeuNCkLIYT4K1eVcPfu3UuPHj0A+Oabb+jYsSN//PEHX3zxBQsWLKjJ+Oq0tkEe6HRwJqeQs7mF9g5HCCFEHXZVCbe4uBij0QjAypUr+fvf/w5Au3btSE1Nrbno6rJtC3D78g4e8tgEyEQGQgghruyqEm6HDh344IMP+O2331ixYgWDBg0C4PTp0/j5+dVogHXW+WNw/Hf6Oh8GpFlZCCHElTlezZteffVVhg0bxuuvv05sbCxRUdr4wosXL7Y0NTd4HYaBXzhHTwVDRoEkXCGEEFd0VQm3X79+nD17luzsbHx8fCzbH3nkEVxdXWssuDotJApCogg1psPvW2USAyGEEFd0VU3K+fn5FBYWWpLt8ePHmTVrFklJSQQGBtZogHVdu2APAI6cyaWoRIZ4FEIIcWlXlXDvuOMOPvvsMwAyMzPp2bMnb775JkOHDmXOnDk1GmCdlnGApgc/I8Y5kWKT4siZXHtHJIQQoo66qoS7fft2+vTpA8B3331HUFAQx48f57PPPuPdd9+t0QDrtL3foVs6mVEuWk9luY8rhBDicq4q4V64cAEPD60pdfny5dx5553o9Xr+9re/cfz48RoNsE4L7QJAhDoCSMIVQghxeVeVcFu3bs0PP/zAiRMnWLZsGQMGDAAgIyMDT0/PGg2wTiudqi+wIBlnCmWIRyGEEJd1VQl36tSpTJo0iRYtWtCjRw969eoFaLXd6OjoGg2wTvMMAfdg9JhprzvOAZnEQAghxGVcVcK9++67SUlJYevWrSxbtsyyvX///rz99ts1Fly9UFrLjXI4ytncIjJyCuwckBBCiLroqqfnCw4OJjo6mtOnT1tmDurRowft2rWrseDqhdKE28s5BZCZg4QQQlzaVSVcs9nMjBkz8PLyIiwsjLCwMLy9vXnppZcwmxvZs6ilCTdSnwxIxykhhBCXdlUjTT333HN8/PHHzJw5k969ewPw+++/M336dAoKCnj55ZdrNMg6rXQy+uCiFNzI54AkXCGEEJdwVQn3008/5aOPPrLMEgTQqVMnmjRpwmOPPda4Eq57IHg2RZd9kg66YySmNq6RtoQQQlTNVTUpnzt37pL3atu1a8e5c+euOah6p7SWG6k/ypEzuRSWmOwbjxBCiDrnqhJuVFQU77//fqXt77//Pp06dbrmoOqd0vu4XZ2OUWJWHEqXIR6FEEJYu6om5ddee43bbruNlStXWp7B3bBhAydOnGDJkiU1GmC9UJpwox2OAVrHqY5NvOwYkBBCiLrmqmq4N954IwcPHmTYsGFkZmaSmZnJnXfeyb59+/jvf/9b0zHWfaHR0LQ7x32vR4eZA2nyaJAQQghrV1XDBQgNDa3UOWrXrl18/PHHzJ0795oDq1dcfeGhlaRsOYE6sVseDRJCCFHJVQ98ISqLCNHGkU5MzUYpZedohBBC1CWScGtQuI+O6/SnOX+hmPTsQnuHI4QQog6RhFtTTmzG+Y0wPje+DkCiTGQghBCigmrdw73zzjuvuD8zM/NaYqnf/NuAMuPiaMKFAhJTs7mprQyCIYQQQlOthOvldeVHXby8vLj//vuvKaB6y8UbJh3iy63Z5C9NkkkMhBBCWKlWwp0/f35txdEwuAcSEaK9lJ7KQgghKpJ7uDWsfWlP5aNncikoliEehRBCaCTh1qTMEwT+eD8/O7+AWSFDPAohhLCQhFuTXLzRHVpOB47gT5Y0KwshhLCwa8Jdt24dQ4YMITQ0FJ1Oxw8//GDPcK6d0UPrrQx01B9lvyRcIYQQpeyacPPy8oiKimL27Nn2DKNmlU5k0EmXLDVcIYQQFlc9lnJNGDx4MIMHD7ZnCDUvNBp2JxCpP8qnaTkopdDpdPaOSgghhJ3JPdyaVlbD1R8lK7+Y1KwCOwckhBCiLrBrDbe6CgsLKSwsH6M4J6cODi4RHAk6PUFkEsh5ElOzCfV2sXdUQggh7Kxe1XDj4+Px8vKyLO3bt7d3SJUZXCEgAtBquXIfVwghBNSzhDtlyhSysrIsy/79++0d0qWVNitH6o+SKJPRCyGEoJ4lXKPRiKenp2Xx8PCwd0iXFtoZgE46qeEKIYTQ2PUebm5uLocPH7asJycns3PnTnx9fWnevLkdI7tGoV0AiNQnc+xsLvlFJlwMDnYOSgghhD3ZtYa7detWoqOjiY7WmmAnTpxIdHQ0U6dOtWdY1y6oA+gd8ddlE6z+JCldmpWFEKKxs2sNt1+/fiil7BlC7XByhsD2kLZbu4+bmk3nZt72jkoIIYQd1avHguqVm/7N11tPsmGPJ8FyH1cIIRq9etVpql5pOxiniMFk4y6T0QshhJCEW5siSufGTUzLbphN50IIIapMEm4tap31B087fYtbQTonz+fbOxwhhBB2JAm3Fjmte5U4h0V01R/igAyAIYQQjZok3NoUcTubvAaTqnxlAAwhhGjkpJdyberzFLvVUbYvSSRIEq4QQjRqUsOtZZaOU5JwhRCiUZOEW8siAp1prztG5rkM8gpL7B2OEEIIO5GEW8v8Fg5nifHf9NXtkiEehRCiEZOEW9uCOgIyN64QQjR2knBrm2Vu3GRJuEII0YhJwq1tpQm3oy6ZpNOZ9o1FCCGE3UjCrW3+4ZgdXXHTFVKYfhCzWYZ4FEKIxkgSbm3TO0BoFACtiw/KEI9CCNFIScK1AX1oF0C7j7tf7uMKIUSjJAnXFkrv40pPZSGEaLwk4dpCacLtoDvGwdTzdg5GCCGEPUjCtQXfVpQ4ueOsKyb/9H57RyOEEMIOJOHagl6PCtY6TgXk7CenoNjOAQkhhLA1Sbg24tRM6zjVSXeUJJkbVwghGh1JuLYSGo0JPe66fOk4JYQQjZDMh2srbW/j3R6reWfdKULXHKFjEy+im/vYOyohhBA2IjVcW3FyZmTvdrTwc+V0VgHDP9zAJ78no5SMPCWEEI2BJFwbCvZy5sd/deaFFol4m84z46f9PPr5drKlE5UQQjR40qRsS0m/4LH2Ncakbadp77mM26hj6b40mp5czNPGHzD6NAH3IPAILv0ZAh5B4B6s/XT2Bp3O3qUQQghxFSTh2lJIZwhqDwY3Bl7fjW87+xH3xXZcck5iLDwG2ceu/H5H5/KEHNAW/v5e+b5T28HRCL6twMmlFgshhBDiakjCtSXPELhjtmW1sx/8PP4GZiSUcM+h9gTpznNzUzO3t9JjuJABOWmQmw45qVCQBSUFkHlcW4ouWJ970b/g7EG4fzG0utHGBRNCCPFXJOHambergTcfuIW5667jtWVJ/JSimJ3vxpx7u9ImyKP8wOL80uSbpi0OTtYncvXTFo+Q8m0Hl4NnKAR3tE1hhBBCXJZO1eNusidPnqRZs2acOHGCpk2b2juca7bl2DnGfbmd9OxCnJ30/N/QSO7uepXlSt8HH8UAOrjzQ4gYUqOxCiGE0FQ1F0kv5TqkewtflozvQ59wfwqKzUz6dheTv9tNQbGp+ifzCIGm3aE4D76+F9a+BvX3u5UQQtR7knDrGD93Iwse6MHEW9qg08HXW08wdPZ6jp7Jrd6JXH3h3oXQ41/a+uqX4dvRUJRX4zELIYT4a5Jw6yAHvY7x/cP5fExP/N0NHEjLYch7v/PT7tPVPJEj3PoaDHkH9E6w/wf4ZBBknqiVuIUQQlyeJNw6rHdrf5aM70OPlr7kFZkY9+UOpv1vL4Ul1Wxi7joaYheDqz+k7YZ5N0HKplqJWVwk9wyUFJWv55+HP4/YLx4hhN1Iwq3jAj2d+fKhnjzW7zoAPt1wnHs+2MCJcxf+4p0XCbseHlkNQZGQdwYW3AY7Pq+FiIXFVyPhjdZw/Pfybds+hfe6wGdDIfFHMJXYLTwhhG1Jwq0HHB30PDOoHfNHd8fb1YndJ7O47d3fWLE/vXon8m4ODy6FiL+DuRj+FwdL/y0f+teqIAv2LYKfnwKzuXy7q6/2M3V3+bbM44AOjq7WOrPNioQ1MyE71aYhCyFsTx4LqmdOZeYz7svt7EjJBOCRvq14emBbnByq8d3JbIZ1r8GaeG39upvhn99UfrZXXJpScOYAHFquPet8YiOYS7+0PPQrNO2qvT5/XBv9yyPY+v3nj8G2BbD9v3DhrLZN5wDtboPuY6DljTKEpxD1SFVzkSTceqioxMzMXw7wyfpkALqG+fD+P6MJ8armkI77/weLxkK3B2Hgy7UQaQNSdAGO/QYHl8GhFZCVYr3fvw2ED9B+l37XVe2cJYVas/KWjyHlj/Ltfq2180SNLK8lCyHqLEm4jcDSvak8/e1ucgpL8HF1YtY/ormxTUD1TnL2EPi01Ho0A5hNoHeo+WDro/PHtBrsoeVasi0pKN/nYISWfSB8IITfAr4tr+1a6fth68ew62soytG2OTpDx7u0L0MuMneyEHWVJNxG4vifeTz2xXb2nc5Gp4NxN7VmQkwbHPRX0SRZUgSf3wltBkKvcY2vWdNsBn2FpvkP+mi9ust4NdNqseEDoGVfMLjWfAyFubDnG9jyCaTvAc8m8MTu8i9ESjW+fxch6riq5iIZS7meC/Nz4/tHr+eln/bzxaYU3vv1MFuPnWfGHR3wdTPgZnTE6KhHV5UP6X2LtJpc6i6tZuUZWvsFqAuK8rSm9eN/wBO7wOiubW93Gxg9tRpsm4EQ0K72k53RXWtO7voAnNyi9SgvS7amYvjgBu0e701TpNYrRD0jNdwG5H87TzFl4R4uFFk/p+uo1+FqcMDd6IibZXHAzeBo2eZqdMDdyYFuGd9R7BVGdrObcDNq+yu+1706CbwuyjqlfanIPw9/e1TbphS821lrQh6ZAG0Hl2+vS+VMWgpfjQC3AHhyPzga7B1R7SjIhoxE7d8jqIO21KV/ByEuIjXcRuiOzk3oEOrF5O93s+90FgXF2iMqJWZFdkEJ2QVVefwnqvTnDgB66BLJx8ge1cpyhENpAg9wNzKwYzB3RjchvOLMRnVJ7hktwSav037+eVjbbvSE7g9rtUedDga/rs22FBpd/t669iEfPkAbrvPCufJkazbDJwO1+8ldR2uPftUXphLt3yNjn3YPO32f9jrzog5pkffAXR/ZJ0YhapDUcBswk1mRV1RCXmHZYiKvsITcwpLS7abyfUUV9pWue1w4yVtZT2KkkOfMY/m+6G+XvVZkEy/u7NKEIVGh+LsbbVjKi+Sfh2Pry5Nsxn7r/To9hHTW7sH2nQTGOvpFoaoOr4TP79Je6/RaUm4doz2K5BGi/XQPqjuPfO39Xuvlnb4XzhwEU+Glj/MIBe9m2jPMMdPhb2O17VmntLmfG2s/A1EnSacpce0KsuD7h7ReuoD5hqfI6z2ZC8WK3MISDqTmsGjHKdYkZVBi1v6MHPQ6+rUJ4M4uTekfEYizkw16POeegT/e1RJs6i7goj/poEitBtiyrzbilrNX7cdkK6ZiSFqiPVqUvPYyB+nAzb88CXuGwu2zypNV2fPCbgE110PdVAzLn9dqrf/8Ggxu2vafnoStn5QfZ3CHwAgIbA9BHSGovfa67HGo4gLtGeey++rbPoUfx2szYT20svw8x9ZDSKf6/wVK1EuScEXNMJtg1Yuw/h1tve1t2vy6FT7Y/swt5Mddp1m04xS7TmZZtns4O3J7pxDu7NKUbmE+NXPftzgfTmwCZdYG7ADti8GrLbRtoD0T27JvaYK9Adz8rv269cHZQ7DzC+1nTpq25KaVD8pRxi0Qnj5Uvr7gdq1F4K6PIfJubdupbbDji/JaskcIeARpP118AQXnjpY2A5c2B7sFwJBZ5ed9vbXW6evhX6FJ6WAgh1dp5w7qoCVW7zDrnuF/JeskHFiiJeSyWPMz4bVWWg0/rFdpLf8WCGgrNWBhE5JwRc3a9TUsflxrAgxsDyO/Ap8WlQ47nJHDwu2n+GHHKU5nlT+32szXhWHRTbkzugkt/N2qft2SIm0YyrIa0o7PtSEpm/aAh1aUH7f2Ne154pZ9Ko/s1JiZzXDhT8hJLU3CqaBMWk/oMp8M0r7ExP4ILW7Qtm39RKuNXoreSasJV3wuGbTHpp7cW76+5SNwdNGaf938a7ZcFaXuhm/uh/PJF8XTHMJjKjzGVf53dz6viD2nsthzSvuC2NLfjVYBbrTwc6u5VhmltN993lkIbFcz5xR1kiRcUfNOboWEUVqtycUX7pqn1SaLC6AkX7tXWJrszLlnOfLHd2w+nssrJzqSV9pz+iGHn+nmeZ5wH0eaeYBBFWm11uJ87RzFBdavS/JhwMvQ6zEthvPHtU5C1/WHO96XGkxNKRtPu+wRpFPbtFG1sk+X15ZzUsuHogQtmQa2K62tlvYmbnWj7WMv8+cR7fbHoRVw7Her+8NmvYFTXl3Y5NCFhbnt+SPTB6j8t6PTQaiXC60C3LguwN2SiFsFuBPi7oS+4Lz2O8g7q9XeyxJq3hlte9PucP3j2skKsmFmM+31v1PLn9te9hzs/kbrpOfmr9XWXf1KF//Sn74V9vuBUzVHkRM2Jb2URc1r2k2bcSjhn3B6R3lnnTL9p0KfpwDQZ58k/I/JhHuEcufze1m+P43vt5/i1mOb6JJ/GPKrcd1T28pf+4TBxERJtDXN4aKPgiZdy5uBKyopgtx0rdXBO6xujUrmdx257g+zL3A4ic3SKDi0luCMdXQp3EpzztDs/EaasZG7gf2GMOI83qFjU2+MehPOpzeTn3WG7wu6cSozn1OZ+bQ/Op+2DjvxIxtnXTaQC7q/qJ+YTeUJ1+ihjRbm6AwFmeUJNycN8jK05UwVyxY+AEZ9W77+4xPg5Ap9ny6/3515AkxF2mApTs5V/70Jm6kTCXf27Nm8/vrrpKWlERUVxXvvvUePHj3sHZa4FM9QeOAX+HkS7E4AvaP2geLkotV4yrj6ar1l3QJwMThwR+cm3NG5Cdm//4ttR46wI7WAEzmKfAwUKAOORhc6twqhV9tmtA71R+fkop3TyVW7N1iRJFv7cTRovYfrgPwiE/tTs9h9Mos9J7PYfSqLI2dyKW+zCwPuA+7leu/z3Om+n56m7TTJ2k54eAdWj7pJO6wwF+L7A/DvZ5I5mg1Hz+TSbvNXRJ1JrHTd88qdc8qDs3hxTnlwTnmS7+SN3j0Qc2EbzOuO0NLfnVYBbjSffAInp4uelx40E3o/odWOL/ypPeZ14WyF9T8hr8Jrc7F1ZzBTiTb5BVi+4AJax8HNc7XXbgFa4vVqqi2eTcCrCXiWrnsE160vS2WUgqJcrXUABegq/H/XaV8kKg74klfa4uLiW94XoOiC9sVDp6v8/orbdHqts6AN2b1J+euvv+b+++/ngw8+oGfPnsyaNYtvv/2WpKQkAgMDr/heaVK2s2scGGL/6WwW7TjJDztPcyanvPmvVYAbd0Y3YWh0E5r6VH/4xBKTmcKSssVEYXGF1yVmCoorb1MKjI56DI56jI56jE4O2k9HPUZHhwrbtfWyffV2AJB6prDExIHUHHafymL3iUz2nMriUEYuJnPlj69QL2cim3rRqak3kU28iGzihY9bhaRXmKs9Plb2xUEpbRhPoweM+G/5/eYTmyHrBLj6U+Lsy6kSD47mOnHkz0KOnMnj6Jlcks/mkZFzmUeb0HrtN/d1JczPFV83A94uBrxdnfB2dcLLxQkf19J1FwNerk54Ojta/00pBYU5Wse3sppsSSFs+lBL0jdPLW+d+OVZLRGXVKH5SOegdYBrdxvc+lr59oPLwT1A6zFeU4+SlRRqTfwFWdDxzvLtG2ZD8m+Qf0779yhbLu7kV1HEEBhRYR7v6aVPHEw6BO6l+eLnSbBl3l/H5d8Gxm2pfnkuod7cw+3Zsyfdu3fn/fffB8BsNtOsWTMef/xxnn322Su+VxJuw1BiMrP+yJ8s3H6SZfvSLAN2APRo6UsTb5dLJk5tvUISLU2yl/oQri0Gh8qJ2FCWsB3Ktmv7nBx06HQ6yv7LlUVZ9j+wfN16P5b9ynK8usS2MnqdDgcHHU56HQ56PY76i9YddDjqtaXiukPZNgd96bE6HB20Yy5ed9DpMCvt6qriz9LYFGj7VWmUCsxKi1d7XaEsKMxmKp2rsMRMYmoOe05lkpSWQ7Gp8r+rv7uRqKZepQnWi45NvAj0sG1zak5BMcln8zh6Jo+jZ7VEfPRMHsln88gvNv31CSpw0OvwcnHC28UJL1ftp4+rofR1ebL2djXg7eJkSdYezo7o9TrtF5p/XuvNnXUSsk9Z/8w6BTmny5Na1EgY9oH2uqQQ/q80aT19pPyLx+Z5cGp7aQ25idb5rGKCvNTS4U647Q3t/fnntacIAJ4/Uz5oy8JHYPfXl/5F6By0Gqjlj19pryOGwPDPyo+ThFt1RUVFuLq68t133zF06FDL9tjYWDIzM/nf//5ndXxhYSGFheXfJk+dOkX79u0l4TYgOQXFLN2bxsLtp9iY/CfX+td5pYToXKEmC9q0h4UlptKfl68h199uhvWbr5uByCZaYtV+ehPkaayzrQxmsyI9p4CjZ/I4ce4CmfnFZF4oJiu/iPN5xWTmF5Wua9urm5wr0uvA08XJMi+2pRG19IWudItOB3plwo8sgtQZ8nUuJOvDAPBWmbxe9DLeZDPC+QPQ6dCh4/8KZ9LXtLFa8fzq0Jvpxqe1FWXi44KnyNG585zxWXJ1bigFXU27CVHpZOvcyVYe2k+dO9l4UIhBu77Ougw6nVY2na60REppLcSATq9HBzhgQo8ZPTp0pffc9TrQU/ZaodcpQI+HpzefjO5e7d/3xepFp6mzZ89iMpkICgqy2h4UFMSBAwcqHR8fH8+LL75oq/CEHXg4O3FPt2bc060ZpzLzWbk/naISs1VNsWICdbZq4nWodJz+amZNugKlFCVmVVrDLq9VF12m5l0xgReVmCudryxZXPwBab1NZ7XPcshl3mtWWquByazFajIrii9aLzEpSsxmbd2kbb943VS6XmIqfY/ZXHouhVkpy4eeTqfVqss+DNHp0F/0wagv3Vn5+LIP1YvOBej1OloHutOpiVaDbeLtUmeT66Xo9TpCvFyqPE91QbHJknwzLxSRmV9M1oVizpe+LkvWmReKOX+hmKzS7ReKTJgVZF4ornJsp3BFu8cNZT0YT2HkVmaUbip/5Otj/Y1s1IURovuTEN2fOFNEJh5kKjcycSdTuZOFG5lKe52JO2eVJ+fzLljOcQuvaC8ulF/vJOFA+CWiMwMFl9he80Lzsm1ynTJ1otNUVU2ZMoWJEyda1stquKJhauLtQuz1LewdhhWdToeTgw4nBz3uxnr130fUcc5ODjg7ORDkWb0m8cISLVFnXSimxKyueLvhcvsq3s64+FYH9Kp0bHDpApfqxlG+4eJ9Fx9a8QvUxfvKbi2Uvy6NXFnfjim/DVFeLhRW2yrGbiljab8NW7LrJ4a/vz8ODg6kp6dbbU9PTyc4uPLgBUajEaOxvFdZdrZtv50IIURdY3R0INDDweb3rkX12Ta9X8RgMNC1a1dWrVpl2WY2m1m1ahW9evWyY2RCCCFEzbJ7m9jEiROJjY2lW7du9OjRg1mzZpGXl8cDDzxg79CEEEKIGmP3hDtixAjOnDnD1KlTSUtLo3PnzixdurRSRyohhBCiPrN7wgUYN24c48aNs3cYQgghRK2x6z1cIYQQorGoEzXcq2U2a881pqam2jkSIYQQjVVZDirLSZdTrxNu2eNEMtGBEEIIe0tPT6d58+aX3W/3sZSvRUlJCTt27CAoKAi9/tpax3Nycmjfvj379+/Hw8Pjr9/QwDTm8jfmskPjLr+UvXGWHWq2/GazmfT0dKKjo3F0vHw9tl4n3JqUnZ2Nl5cXWVlZeHp62jscm2vM5W/MZYfGXX4pe+MsO9in/NJpSgghhLABSbhCCCGEDUjCLWU0Gpk2bZrVWM2NSWMuf2MuOzTu8kvZG2fZwT7ll3u4QgghhA1IDVcIIYSwAUm4QgghhA1IwhVCCCFsQBJuqdmzZ9OiRQucnZ3p2bMnmzdvtndINrFu3TqGDBlCaGgoOp2OH374wd4h2Ux8fDzdu3fHw8ODwMBAhg4dSlJSkr3Dsok5c+bQqVMnPD098fT0pFevXvzyyy/2DssuZs6ciU6nY8KECfYOxSamT5+OTqezWtq1a2fvsGzm1KlT3Hvvvfj5+eHi4kJkZCRbt261ybUl4QJff/01EydOZNq0aWzfvp2oqCgGDhxIRkaGvUOrdXl5eURFRTF79mx7h2Jza9euJS4ujo0bN7JixQqKi4sZMGAAeXl59g6t1jVt2pSZM2eybds2tm7dys0338wdd9zBvn377B2aTW3ZsoUPP/yQTp062TsUm+rQoQOpqamW5ffff7d3SDZx/vx5evfujZOTE7/88gv79+/nzTffxMfHxzYBKKF69Oih4uLiLOsmk0mFhoaq+Ph4O0Zle4BatGiRvcOwm4yMDAWotWvX2jsUu/Dx8VEfffSRvcOwmZycHBUeHq5WrFihbrzxRvXEE0/YOySbmDZtmoqKirJ3GHYxefJkdcMNN9jt+o2+hltUVMS2bduIiYmxbNPr9cTExLBhwwY7RiZsLSsrCwBfX187R2JbJpOJhIQE8vLy6NWrl73DsZm4uDhuu+02q//7jcWhQ4cIDQ2lVatWjBo1ipSUFHuHZBOLFy+mW7du3HPPPQQGBhIdHc28efNsdv1Gn3DPnj2LyWQiKCjIantQUBBpaWl2ikrYmtlsZsKECfTu3ZuOHTvaOxyb2LNnD+7u7hiNRsaOHcuiRYto3769vcOyiYSEBLZv3058fLy9Q7G5nj17smDBApYuXcqcOXNITk6mT58+5OTk2Du0Wnf06FHmzJlDeHg4y5Yt49FHH2X8+PF8+umnNrl+vZ6eT4iaEhcXx969exvNvSyAtm3bsnPnTrKysvjuu++IjY1l7dq1DT7pnjhxgieeeIIVK1bg7Oxs73BsbvDgwZbXnTp1omfPnoSFhfHNN98wZswYO0ZW+8xmM926deOVV14BIDo6mr179/LBBx8QGxtb69dv9DVcf39/HBwcLHPrlklPTyc4ONhOUQlbGjduHD/99BOrV6+madOm9g7HZgwGA61bt6Zr167Ex8cTFRXFO++8Y++wat22bdvIyMigS5cuODo64ujoyNq1a3n33XdxdHTEZDLZO0Sb8vb2pk2bNhw+fNjeodS6kJCQSl8oIyIibNak3ugTrsFgoGvXrqxatcqyzWw2s2rVqkZ1P6sxUkoxbtw4Fi1axK+//krLli3tHZJdmc1mCgsL7R1Grevfvz979uxh586dlqVbt26MGjWKnTt34uDgYO8QbSo3N5cjR44QEhJi71BqXe/evSs9+nfw4EHCwsJscn1pUgYmTpxIbGws3bp1o0ePHsyaNYu8vDweeOABe4dW63Jzc62+2SYnJ7Nz5058fX1p3ry5HSOrfXFxcXz55Zf873//w8PDw3LP3svLCxcXFztHV7umTJnC4MGDad68OTk5OXz55ZesWbOGZcuW2Tu0Wufh4VHpPr2bmxt+fn6N4v79pEmTGDJkCGFhYZw+fZpp06bh4ODAyJEj7R1arXvyySe5/vrreeWVVxg+fDibN29m7ty5zJ071zYB2K1/dB3z3nvvqebNmyuDwaB69OihNm7caO+QbGL16tUKqLTExsbaO7Rad6lyA2r+/Pn2Dq3WPfjggyosLEwZDAYVEBCg+vfvr5YvX27vsOymMT0WNGLECBUSEqIMBoNq0qSJGjFihDp8+LC9w7KZH3/8UXXs2FEZjUbVrl07NXfuXJtdW2YLEkIIIWyg0d/DFUIIIWxBEq4QQghhA5JwhRBCCBuQhCuEEELYgCRcIYQQwgYk4QohhBA2IAlXCCGEsAFJuEIIIYQNSMIVQlSJTqfjhx9+sHcYQtRbknCFqAdGjx6NTqertAwaNMjeoQkhqkgmLxCinhg0aBDz58+32mY0Gu0UjRCiuqSGK0Q9YTQaCQ4Otlp8fHwArbl3zpw5DB48GBcXF1q1asV3331n9f49e/Zw88034+Ligp+fH4888gi5ublWx3zyySd06NABo9FISEgI48aNs9p/9uxZhg0bhqurK+Hh4SxevNiy7/z584waNYqAgABcXFwIDw+v9AVBiMZMEq4QDcQLL7zAXXfdxa5duxg1ahT/+Mc/SExMBCAvL4+BAwfi4+PDli1b+Pbbb1m5cqVVQp0zZw5xcXE88sgj7Nmzh8WLF9O6dWura7z44osMHz6c3bt3c+uttzJq1CjOnTtnuf7+/fv55ZdfSExMZM6cOfj7+9vuFyBEXWezeYmEEFctNjZWOTg4KDc3N6vl5ZdfVkppUw2OHTvW6j09e/ZUjz76qFJKqblz5yofHx+Vm5tr2f/zzz8rvV6v0tLSlFJKhYaGqueee+6yMQDq+eeft6zn5uYqQP3yyy9KKaWGDBmiHnjggZopsBANkNzDFaKeuOmmm5gzZ47VNl9fX8vrXr16We3r1asXO3fuBCAxMZGoqCjc3Nws+3v37o3ZbCYpKQmdTsfp06fp37//FWPo1KmT5bWbmxuenp5kZGQA8Oijj3LXXXexfft2BgwYwNChQ7n++uuvqqxCNESScIWoJ9zc3Co18dYUFxeXKh3n5ORkta7T6TCbzQAMHjyY48ePs2TJElasWEH//v2Ji4vjjTfeqPF4haiP5B6uEA3Exo0bK61HREQAEBERwa5du8jLy7PsX79+PXq9nrZt2+Lh4UGLFi1YtWrVNcUQEBBAbGwsn3/+ObNmzWLu3LnXdD4hGhKp4QpRTxQWFpKWlma1zdHR0dIx6dtvv6Vbt27ccMMNfPHFF2zevJmPP/4YgFGjRjFt2jRiY2OZPn06Z86c4fHHH+e+++4jKCgIgOnTpzN27FgCAwMZPHgwOTk5rF+/nscff7xK8U2dOpWuXbvSoUMHCgsL+emnnywJXwghCVeIemPp0qWEhIRYbWvbti0HDhwAtB7ECQkJPPbYY4SEhPDVV1/Rvn17AFxdXVm2bBlPPPEE3bt3x9XVlbvuuou33nrLcq7Y2FgKCgp4++23mTRpEv7+/tx9991Vjs9gMDBlyhSOHTuGi4sLffr0ISEhoQZKLkTDoFNKKXsHIYS4NjqdjkWLFjF06FB7hyKEuAy5hyuEEELYgCRcIYQQwgbkHq4QDYDcGRKi7pMarhBCCGEDknCFEEIIG5CEK4QQQtiAJFwhhBDCBiThCiGEEDYgCVcIIYSwAUm4QgghhA1IwhVCCCFsQBKuEEIIYQP/DwJSReFGugSPAAAAAElFTkSuQmCC", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from previous_chapters import plot_values\n", + "\n", + "epochs_tensor = torch.linspace(0, num_epochs, len(train_losses))\n", + "examples_seen_tensor = torch.linspace(0, examples_seen, len(train_losses))\n", + "\n", + "plot_values(epochs_tensor, examples_seen_tensor, train_losses, val_losses, label=\"loss\")" + ] + }, + { + "cell_type": "markdown", + "id": "aa074723-e3f7-4f7e-a267-855531a037dc", + "metadata": {}, + "source": [ + "- Note that we previously calculated the accuracy values on 10 batches only; below we calculate the accuracies on the full dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "1D2awlEq0gZi", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1D2awlEq0gZi", + "outputId": "b482af19-5ebd-45b9-a9f0-99f621203ef9" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training accuracy: 100.00%\n", + "Validation accuracy: 96.64%\n", + "Test accuracy: 98.00%\n" + ] + } + ], + "source": [ + "from previous_chapters import calc_accuracy_loader\n", + "\n", + "train_accuracy = calc_accuracy_loader(train_loader, model, device)\n", + "val_accuracy = calc_accuracy_loader(val_loader, model, device)\n", + "test_accuracy = calc_accuracy_loader(test_loader, model, device)\n", + "\n", + "print(f\"Training accuracy: {train_accuracy*100:.2f}%\")\n", + "print(f\"Validation accuracy: {val_accuracy*100:.2f}%\")\n", + "print(f\"Test accuracy: {test_accuracy*100:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "1f87f5e6-339e-4fcf-900b-6d845d3c713d", + "metadata": {}, + "source": [ + "- As we can based on the relatively high accuracy values above, the LoRA finetuning was successful" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "V100", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/appendix-E/01_main-chapter-code/gpt_download.py b/appendix-E/01_main-chapter-code/gpt_download.py new file mode 100644 index 0000000..0d695d2 --- /dev/null +++ b/appendix-E/01_main-chapter-code/gpt_download.py @@ -0,0 +1,99 @@ +# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt). +# Source for "Build a Large Language Model From Scratch" +# - https://www.manning.com/books/build-a-large-language-model-from-scratch +# Code: https://github.com/rasbt/LLMs-from-scratch + + +import os +import requests +import json +import numpy as np +import tensorflow as tf +from tqdm import tqdm + + +def download_and_load_gpt2(model_size, models_dir): + # Validate model size + allowed_sizes = ("124M", "355M", "774M", "1558M") + if model_size not in allowed_sizes: + raise ValueError(f"Model size not in {allowed_sizes}") + + # Define paths + model_dir = os.path.join(models_dir, model_size) + base_url = "https://openaipublic.blob.core.windows.net/gpt-2/models" + filenames = [ + "checkpoint", "encoder.json", "hparams.json", + "model.ckpt.data-00000-of-00001", "model.ckpt.index", + "model.ckpt.meta", "vocab.bpe" + ] + + # Download files + os.makedirs(model_dir, exist_ok=True) + for filename in filenames: + file_url = os.path.join(base_url, model_size, filename) + file_path = os.path.join(model_dir, filename) + download_file(file_url, file_path) + + # Load settings and params + tf_ckpt_path = tf.train.latest_checkpoint(model_dir) + settings = json.load(open(os.path.join(model_dir, "hparams.json"))) + params = load_gpt2_params_from_tf_ckpt(tf_ckpt_path, settings) + + return settings, params + + +def download_file(url, destination): + # Send a GET request to download the file in streaming mode + response = requests.get(url, stream=True) + + # Get the total file size from headers, defaulting to 0 if not present + file_size = int(response.headers.get("content-length", 0)) + + # Check if file exists and has the same size + if os.path.exists(destination): + file_size_local = os.path.getsize(destination) + if file_size == file_size_local: + print(f"File already exists and is up-to-date: {destination}") + return + + # Define the block size for reading the file + block_size = 1024 # 1 Kilobyte + + # Initialize the progress bar with total file size + progress_bar_description = url.split("/")[-1] # Extract filename from URL + with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar: + # Open the destination file in binary write mode + with open(destination, "wb") as file: + # Iterate over the file data in chunks + for chunk in response.iter_content(block_size): + progress_bar.update(len(chunk)) # Update progress bar + file.write(chunk) # Write the chunk to the file + + +def load_gpt2_params_from_tf_ckpt(ckpt_path, settings): + # Initialize parameters dictionary with empty blocks for each layer + params = {"blocks": [{} for _ in range(settings["n_layer"])]} + + # Iterate over each variable in the checkpoint + for name, _ in tf.train.list_variables(ckpt_path): + # Load the variable and remove singleton dimensions + variable_array = np.squeeze(tf.train.load_variable(ckpt_path, name)) + + # Process the variable name to extract relevant parts + variable_name_parts = name.split("/")[1:] # Skip the 'model/' prefix + + # Identify the target dictionary for the variable + target_dict = params + if variable_name_parts[0].startswith("h"): + layer_number = int(variable_name_parts[0][1:]) + target_dict = params["blocks"][layer_number] + + # Recursively access or create nested dictionaries + for key in variable_name_parts[1:-1]: + target_dict = target_dict.setdefault(key, {}) + + # Assign the variable array to the last key + last_key = variable_name_parts[-1] + target_dict[last_key] = variable_array + + return params diff --git a/appendix-E/01_main-chapter-code/previous_chapters.py b/appendix-E/01_main-chapter-code/previous_chapters.py new file mode 100644 index 0000000..b6fca51 --- /dev/null +++ b/appendix-E/01_main-chapter-code/previous_chapters.py @@ -0,0 +1,542 @@ +# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt). +# Source for "Build a Large Language Model From Scratch" +# - https://www.manning.com/books/build-a-large-language-model-from-scratch +# Code: https://github.com/rasbt/LLMs-from-scratch +# +# This file collects all the relevant code that we covered thus far +# throughout Chapters 2-6. +# This file can be run as a standalone script. + +import os +from pathlib import Path +import urllib +import zipfile + +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import tiktoken +import torch +import torch.nn as nn +from torch.utils.data import Dataset, DataLoader + + +##################################### +# Chapter 2 +##################################### + + +class GPTDatasetV1(Dataset): + def __init__(self, txt, tokenizer, max_length, stride): + self.tokenizer = tokenizer + self.input_ids = [] + self.target_ids = [] + + # Tokenize the entire text + token_ids = tokenizer.encode(txt) + + # Use a sliding window to chunk the book into overlapping sequences of max_length + for i in range(0, len(token_ids) - max_length, stride): + input_chunk = token_ids[i:i + max_length] + target_chunk = token_ids[i + 1: i + max_length + 1] + self.input_ids.append(torch.tensor(input_chunk)) + self.target_ids.append(torch.tensor(target_chunk)) + + def __len__(self): + return len(self.input_ids) + + def __getitem__(self, idx): + return self.input_ids[idx], self.target_ids[idx] + + +def create_dataloader_v1(txt, batch_size=4, max_length=256, + stride=128, shuffle=True, drop_last=True): + # Initialize the tokenizer + tokenizer = tiktoken.get_encoding("gpt2") + + # Create dataset + dataset = GPTDatasetV1(txt, tokenizer, max_length, stride) + + # Create dataloader + dataloader = DataLoader( + dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last) + + return dataloader + + +##################################### +# Chapter 3 +##################################### +class MultiHeadAttention(nn.Module): + def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False): + super().__init__() + assert d_out % num_heads == 0, "d_out must be divisible by n_heads" + + self.d_out = d_out + self.num_heads = num_heads + self.head_dim = d_out // num_heads # Reduce the projection dim to match desired output dim + + self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) + self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) + self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) + self.out_proj = nn.Linear(d_out, d_out) # Linear layer to combine head outputs + self.dropout = nn.Dropout(dropout) + self.register_buffer('mask', torch.triu(torch.ones(context_length, context_length), diagonal=1)) + + def forward(self, x): + b, num_tokens, d_in = x.shape + + keys = self.W_key(x) # Shape: (b, num_tokens, d_out) + queries = self.W_query(x) + values = self.W_value(x) + + # We implicitly split the matrix by adding a `num_heads` dimension + # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim) + keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) + values = values.view(b, num_tokens, self.num_heads, self.head_dim) + queries = queries.view(b, num_tokens, self.num_heads, self.head_dim) + + # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim) + keys = keys.transpose(1, 2) + queries = queries.transpose(1, 2) + values = values.transpose(1, 2) + + # Compute scaled dot-product attention (aka self-attention) with a causal mask + attn_scores = queries @ keys.transpose(2, 3) # Dot product for each head + + # Original mask truncated to the number of tokens and converted to boolean + mask_bool = self.mask.bool()[:num_tokens, :num_tokens] + + # Use the mask to fill attention scores + attn_scores.masked_fill_(mask_bool, -torch.inf) + + attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1) + attn_weights = self.dropout(attn_weights) + + # Shape: (b, num_tokens, num_heads, head_dim) + context_vec = (attn_weights @ values).transpose(1, 2) + + # Combine heads, where self.d_out = self.num_heads * self.head_dim + context_vec = context_vec.reshape(b, num_tokens, self.d_out) + context_vec = self.out_proj(context_vec) # optional projection + + return context_vec + + +##################################### +# Chapter 4 +##################################### +class LayerNorm(nn.Module): + def __init__(self, emb_dim): + super().__init__() + self.eps = 1e-5 + self.scale = nn.Parameter(torch.ones(emb_dim)) + self.shift = nn.Parameter(torch.zeros(emb_dim)) + + def forward(self, x): + mean = x.mean(dim=-1, keepdim=True) + var = x.var(dim=-1, keepdim=True, unbiased=False) + norm_x = (x - mean) / torch.sqrt(var + self.eps) + return self.scale * norm_x + self.shift + + +class GELU(nn.Module): + def __init__(self): + super().__init__() + + def forward(self, x): + return 0.5 * x * (1 + torch.tanh( + torch.sqrt(torch.tensor(2.0 / torch.pi)) * + (x + 0.044715 * torch.pow(x, 3)) + )) + + +class FeedForward(nn.Module): + def __init__(self, cfg): + super().__init__() + self.layers = nn.Sequential( + nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]), + GELU(), + nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]), + ) + + def forward(self, x): + return self.layers(x) + + +class TransformerBlock(nn.Module): + def __init__(self, cfg): + super().__init__() + self.att = MultiHeadAttention( + d_in=cfg["emb_dim"], + d_out=cfg["emb_dim"], + context_length=cfg["context_length"], + num_heads=cfg["n_heads"], + dropout=cfg["drop_rate"], + qkv_bias=cfg["qkv_bias"]) + self.ff = FeedForward(cfg) + self.norm1 = LayerNorm(cfg["emb_dim"]) + self.norm2 = LayerNorm(cfg["emb_dim"]) + self.drop_resid = nn.Dropout(cfg["drop_rate"]) + + def forward(self, x): + # Shortcut connection for attention block + shortcut = x + x = self.norm1(x) + x = self.att(x) # Shape [batch_size, num_tokens, emb_size] + x = self.drop_resid(x) + x = x + shortcut # Add the original input back + + # Shortcut connection for feed-forward block + shortcut = x + x = self.norm2(x) + x = self.ff(x) + x = self.drop_resid(x) + x = x + shortcut # Add the original input back + + return x + + +class GPTModel(nn.Module): + def __init__(self, cfg): + super().__init__() + self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"]) + self.pos_emb = nn.Embedding(cfg["context_length"], cfg["emb_dim"]) + self.drop_emb = nn.Dropout(cfg["drop_rate"]) + + self.trf_blocks = nn.Sequential( + *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])]) + + self.final_norm = LayerNorm(cfg["emb_dim"]) + self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False) + + def forward(self, in_idx): + batch_size, seq_len = in_idx.shape + tok_embeds = self.tok_emb(in_idx) + pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device)) + x = tok_embeds + pos_embeds # Shape [batch_size, num_tokens, emb_size] + x = self.drop_emb(x) + x = self.trf_blocks(x) + x = self.final_norm(x) + logits = self.out_head(x) + return logits + + +def generate_text_simple(model, idx, max_new_tokens, context_size): + # idx is (B, T) array of indices in the current context + for _ in range(max_new_tokens): + + # Crop current context if it exceeds the supported context size + # E.g., if LLM supports only 5 tokens, and the context size is 10 + # then only the last 5 tokens are used as context + idx_cond = idx[:, -context_size:] + + # Get the predictions + with torch.no_grad(): + logits = model(idx_cond) + + # Focus only on the last time step + # (batch, n_token, vocab_size) becomes (batch, vocab_size) + logits = logits[:, -1, :] + + # Get the idx of the vocab entry with the highest logits value + idx_next = torch.argmax(logits, dim=-1, keepdim=True) # (batch, 1) + + # Append sampled index to the running sequence + idx = torch.cat((idx, idx_next), dim=1) # (batch, n_tokens+1) + + return idx + + +##################################### +# Chapter 5 +##################################### +def assign(left, right): + if left.shape != right.shape: + raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}") + return torch.nn.Parameter(torch.tensor(right)) + + +def load_weights_into_gpt(gpt, params): + gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe']) + gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte']) + + for b in range(len(params["blocks"])): + q_w, k_w, v_w = np.split( + (params["blocks"][b]["attn"]["c_attn"])["w"], 3, axis=-1) + gpt.trf_blocks[b].att.W_query.weight = assign( + gpt.trf_blocks[b].att.W_query.weight, q_w.T) + gpt.trf_blocks[b].att.W_key.weight = assign( + gpt.trf_blocks[b].att.W_key.weight, k_w.T) + gpt.trf_blocks[b].att.W_value.weight = assign( + gpt.trf_blocks[b].att.W_value.weight, v_w.T) + + q_b, k_b, v_b = np.split( + (params["blocks"][b]["attn"]["c_attn"])["b"], 3, axis=-1) + gpt.trf_blocks[b].att.W_query.bias = assign( + gpt.trf_blocks[b].att.W_query.bias, q_b) + gpt.trf_blocks[b].att.W_key.bias = assign( + gpt.trf_blocks[b].att.W_key.bias, k_b) + gpt.trf_blocks[b].att.W_value.bias = assign( + gpt.trf_blocks[b].att.W_value.bias, v_b) + + gpt.trf_blocks[b].att.out_proj.weight = assign( + gpt.trf_blocks[b].att.out_proj.weight, + params["blocks"][b]["attn"]["c_proj"]["w"].T) + gpt.trf_blocks[b].att.out_proj.bias = assign( + gpt.trf_blocks[b].att.out_proj.bias, + params["blocks"][b]["attn"]["c_proj"]["b"]) + + gpt.trf_blocks[b].ff.layers[0].weight = assign( + gpt.trf_blocks[b].ff.layers[0].weight, + params["blocks"][b]["mlp"]["c_fc"]["w"].T) + gpt.trf_blocks[b].ff.layers[0].bias = assign( + gpt.trf_blocks[b].ff.layers[0].bias, + params["blocks"][b]["mlp"]["c_fc"]["b"]) + gpt.trf_blocks[b].ff.layers[2].weight = assign( + gpt.trf_blocks[b].ff.layers[2].weight, + params["blocks"][b]["mlp"]["c_proj"]["w"].T) + gpt.trf_blocks[b].ff.layers[2].bias = assign( + gpt.trf_blocks[b].ff.layers[2].bias, + params["blocks"][b]["mlp"]["c_proj"]["b"]) + + gpt.trf_blocks[b].norm1.scale = assign( + gpt.trf_blocks[b].norm1.scale, + params["blocks"][b]["ln_1"]["g"]) + gpt.trf_blocks[b].norm1.shift = assign( + gpt.trf_blocks[b].norm1.shift, + params["blocks"][b]["ln_1"]["b"]) + gpt.trf_blocks[b].norm2.scale = assign( + gpt.trf_blocks[b].norm2.scale, + params["blocks"][b]["ln_2"]["g"]) + gpt.trf_blocks[b].norm2.shift = assign( + gpt.trf_blocks[b].norm2.shift, + params["blocks"][b]["ln_2"]["b"]) + + gpt.final_norm.scale = assign(gpt.final_norm.scale, params["g"]) + gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"]) + gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"]) + + +def text_to_token_ids(text, tokenizer): + encoded = tokenizer.encode(text, allowed_special={'<|endoftext|>'}) + encoded_tensor = torch.tensor(encoded).unsqueeze(0) # add batch dimension + return encoded_tensor + + +def token_ids_to_text(token_ids, tokenizer): + flat = token_ids.squeeze(0) # remove batch dimension + return tokenizer.decode(flat.tolist()) + + +def calc_loss_loader(data_loader, model, device, num_batches=None): + total_loss = 0. + if len(data_loader) == 0: + return float("nan") + elif num_batches is None: + num_batches = len(data_loader) + else: + # Reduce the number of batches to match the total number of batches in the data loader + # if num_batches exceeds the number of batches in the data loader + num_batches = min(num_batches, len(data_loader)) + for i, (input_batch, target_batch) in enumerate(data_loader): + if i < num_batches: + loss = calc_loss_batch(input_batch, target_batch, model, device) + total_loss += loss.item() + else: + break + return total_loss / num_batches + + +def evaluate_model(model, train_loader, val_loader, device, eval_iter): + model.eval() + with torch.no_grad(): + train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter) + val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter) + model.train() + return train_loss, val_loss + + +##################################### +# Chapter 6 +##################################### + + +def download_and_unzip(url, zip_path, extracted_path, data_file_path): + if data_file_path.exists(): + print(f"{data_file_path} already exists. Skipping download and extraction.") + return + + # Downloading the file + with urllib.request.urlopen(url) as response: + with open(zip_path, "wb") as out_file: + out_file.write(response.read()) + + # Unzipping the file + with zipfile.ZipFile(zip_path, "r") as zip_ref: + zip_ref.extractall(extracted_path) + + # Add .tsv file extension + original_file_path = Path(extracted_path) / "SMSSpamCollection" + os.rename(original_file_path, data_file_path) + print(f"File downloaded and saved as {data_file_path}") + + +def create_balanced_dataset(df): + + # Count the instances of "spam" + num_spam = df[df["Label"] == "spam"].shape[0] + + # Randomly sample "ham' instances to match the number of 'spam' instances + ham_subset = df[df["Label"] == "ham"].sample(num_spam, random_state=123) + + # Combine ham "subset" with "spam" + balanced_df = pd.concat([ham_subset, df[df["Label"] == "spam"]]) + + return balanced_df + + +def random_split(df, train_frac, validation_frac): + # Shuffle the entire DataFrame + df = df.sample(frac=1, random_state=123).reset_index(drop=True) + + # Calculate split indices + train_end = int(len(df) * train_frac) + validation_end = train_end + int(len(df) * validation_frac) + + # Split the DataFrame + train_df = df[:train_end] + validation_df = df[train_end:validation_end] + test_df = df[validation_end:] + + return train_df, validation_df, test_df + + +class SpamDataset(Dataset): + def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=50256): + self.data = pd.read_csv(csv_file) + + # Pre-tokenize texts + self.encoded_texts = [ + tokenizer.encode(text) for text in self.data["Text"] + ] + + if max_length is None: + self.max_length = self._longest_encoded_length() + else: + self.max_length = max_length + # Truncate sequences if they are longer than max_length + self.encoded_texts = [ + encoded_text[:self.max_length] + for encoded_text in self.encoded_texts + ] + + # Pad sequences to the longest sequence + self.encoded_texts = [ + encoded_text + [pad_token_id] * (self.max_length - len(encoded_text)) + for encoded_text in self.encoded_texts + ] + + def __getitem__(self, index): + encoded = self.encoded_texts[index] + label = self.data.iloc[index]["Label"] + return torch.tensor(encoded, dtype=torch.long), torch.tensor(label, dtype=torch.long) + + def __len__(self): + return len(self.data) + + def _longest_encoded_length(self): + max_length = 0 + for encoded_text in self.encoded_texts: + encoded_length = len(encoded_text) + if encoded_length > max_length: + max_length = encoded_length + return max_length + + +@torch.no_grad() # Disable gradient tracking for efficiency +def calc_accuracy_loader(data_loader, model, device, num_batches=None): + model.eval() + correct_predictions, num_examples = 0, 0 + + if num_batches is None: + num_batches = len(data_loader) + else: + num_batches = min(num_batches, len(data_loader)) + for i, (input_batch, target_batch) in enumerate(data_loader): + if i < num_batches: + input_batch, target_batch = input_batch.to(device), target_batch.to(device) + logits = model(input_batch)[:, -1, :] # Logits of last ouput token + predicted_labels = torch.argmax(logits, dim=-1) + + num_examples += predicted_labels.shape[0] + correct_predictions += (predicted_labels == target_batch).sum().item() + else: + break + return correct_predictions / num_examples + + +def calc_loss_batch(input_batch, target_batch, model, device): + input_batch, target_batch = input_batch.to(device), target_batch.to(device) + logits = model(input_batch)[:, -1, :] # Logits of last ouput token + loss = torch.nn.functional.cross_entropy(logits, target_batch) + return loss + + +# Overall the same as `train_model_simple` in chapter 5 +def train_classifier_simple(model, train_loader, val_loader, optimizer, device, num_epochs, + eval_freq, eval_iter, tokenizer): + # Initialize lists to track losses and tokens seen + train_losses, val_losses, train_accs, val_accs = [], [], [], [] + examples_seen, global_step = 0, -1 + + # Main training loop + for epoch in range(num_epochs): + model.train() # Set model to training mode + + for input_batch, target_batch in train_loader: + optimizer.zero_grad() # Reset loss gradients from previous epoch + loss = calc_loss_batch(input_batch, target_batch, model, device) + loss.backward() # Calculate loss gradients + optimizer.step() # Update model weights using loss gradients + examples_seen += input_batch.shape[0] # New: track examples instead of tokens + global_step += 1 + + # Optional evaluation step + if global_step % eval_freq == 0: + train_loss, val_loss = evaluate_model( + model, train_loader, val_loader, device, eval_iter) + train_losses.append(train_loss) + val_losses.append(val_loss) + print(f"Ep {epoch+1} (Step {global_step:06d}): " + f"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}") + + # Calculate accuracy after each epoch + train_accuracy = calc_accuracy_loader(train_loader, model, device, num_batches=eval_iter) + val_accuracy = calc_accuracy_loader(val_loader, model, device, num_batches=eval_iter) + print(f"Training accuracy: {train_accuracy*100:.2f}% | ", end="") + print(f"Validation accuracy: {val_accuracy*100:.2f}%") + train_accs.append(train_accuracy) + val_accs.append(val_accuracy) + + return train_losses, val_losses, train_accs, val_accs, examples_seen + + +def plot_values(epochs_seen, examples_seen, train_values, val_values, label="loss"): + fig, ax1 = plt.subplots(figsize=(5, 3)) + + # Plot training and validation loss against epochs + ax1.plot(epochs_seen, train_values, label=f"Training {label}") + ax1.plot(epochs_seen, val_values, linestyle="-.", label=f"Validation {label}") + ax1.set_xlabel("Epochs") + ax1.set_ylabel(label.capitalize()) + ax1.legend() + + # Create a second x-axis for tokens seen + ax2 = ax1.twiny() # Create a second x-axis that shares the same y-axis + ax2.plot(examples_seen, train_values, alpha=0) # Invisible plot for aligning ticks + ax2.set_xlabel("Examples seen") + + fig.tight_layout() # Adjust layout to make room + plt.savefig(f"{label}-plot.pdf") + plt.show() diff --git a/appendix-E/README.md b/appendix-E/README.md new file mode 100644 index 0000000..a07d712 --- /dev/null +++ b/appendix-E/README.md @@ -0,0 +1,3 @@ +# Appendix E: Parameter-efficient Finetuning with LoRA + +- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code. \ No newline at end of file