2024-03-06 08:30:32 -06:00
{
2024-03-13 08:37:54 -05:00
"cells": [
2024-03-19 09:26:26 -05:00
{
"cell_type": "markdown",
"id": "e2e65c03-36d4-413f-9b23-5cdd816729ab",
"metadata": {},
"source": [
2024-05-24 07:20:37 -05:00
"<table style=\"width:100%\">\n",
"<tr>\n",
"<td style=\"vertical-align:middle; text-align:left;\">\n",
"<font size=\"2\">\n",
"Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
"<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
"</font>\n",
"</td>\n",
"<td style=\"vertical-align:middle; text-align:left;\">\n",
"<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
"</td>\n",
"</tr>\n",
"</table>"
2024-03-19 09:26:26 -05:00
]
},
2024-03-13 08:37:54 -05:00
{
"cell_type": "markdown",
"id": "6f678e62-7bcb-4405-86ae-dce94f494303",
"metadata": {
"id": "6f678e62-7bcb-4405-86ae-dce94f494303"
},
"source": [
2024-03-23 07:27:43 -05:00
"# Comparing Efficient Multi-Head Attention Implementations"
2024-03-13 08:37:54 -05:00
]
},
{
"cell_type": "markdown",
"id": "b742938a-4bfc-4527-a1f1-d5963508967d",
"metadata": {
"id": "b742938a-4bfc-4527-a1f1-d5963508967d"
},
"source": [
"This code notebook compares different ways to implement causal multi-head attention used in decoder-style LLMs like GPT, Llama, etc."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7898551e-f582-48ac-9f66-3632abe2a93f",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7898551e-f582-48ac-9f66-3632abe2a93f",
"outputId": "7d088260-3fa1-44f2-bd65-2a46e289f9d4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"PyTorch version: 2.2.2\n",
"Running on cpu\n"
2024-03-13 08:37:54 -05:00
]
2024-03-08 09:30:55 -06:00
}
2024-03-13 08:37:54 -05:00
],
"source": [
"import torch\n",
"\n",
"torch.manual_seed(123)\n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"print(f\"PyTorch version: {torch.__version__}\")\n",
"print(f\"Running on {device}\")\n",
"\n",
"batch_size = 8\n",
"context_len = 1024\n",
"embed_dim = 768\n",
"embeddings = torch.randn((batch_size, context_len, embed_dim), device=device)"
]
},
{
"cell_type": "markdown",
"id": "2f9bb1b6-a1e5-4e0a-884d-0f31b374a8d6",
"metadata": {
"id": "2f9bb1b6-a1e5-4e0a-884d-0f31b374a8d6"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## 1) CausalAttention MHA wrapper class from chapter 3"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "297c93ed-aec0-4896-bb89-42c4b294d3d1",
"metadata": {
2024-03-09 10:20:08 -06:00
"colab": {
2024-03-13 08:37:54 -05:00
"base_uri": "https://localhost:8080/"
},
"id": "297c93ed-aec0-4896-bb89-42c4b294d3d1",
"outputId": "f8a33752-2cd6-4101-8feb-9d1699984719"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([8, 1024, 768])\n"
]
}
],
"source": [
"from ch03 import MultiHeadAttentionWrapper as Ch03_MHA_Wrapper\n",
"\n",
"mha_ch03_wrapper = Ch03_MHA_Wrapper(\n",
" d_in=embed_dim,\n",
" d_out=embed_dim//12,\n",
2024-04-04 07:27:41 -05:00
" context_length=context_len,\n",
2024-03-13 08:37:54 -05:00
" dropout=0.0,\n",
" num_heads=12,\n",
" qkv_bias=False\n",
").to(device)\n",
"\n",
"out = mha_ch03_wrapper(embeddings)\n",
"print(out.shape)"
]
},
{
"cell_type": "markdown",
"id": "21930804-b327-40b1-8e63-94dcad39ce7b",
"metadata": {
"id": "21930804-b327-40b1-8e63-94dcad39ce7b"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## 2) The multi-head attention class from chapter 3"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4ee6a61b-d25c-4a0c-8a59-f285544e3710",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4ee6a61b-d25c-4a0c-8a59-f285544e3710",
"outputId": "b704a040-3547-422c-ecda-df9982a2da35"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([8, 1024, 768])\n"
]
}
],
"source": [
"from ch03 import MultiHeadAttention as Ch03_MHA\n",
"\n",
"mha_ch03 = Ch03_MHA(\n",
" d_in=embed_dim,\n",
" d_out=embed_dim,\n",
2024-04-04 07:27:41 -05:00
" context_length=context_len,\n",
2024-03-13 08:37:54 -05:00
" dropout=0.0,\n",
" num_heads=12,\n",
" qkv_bias=False\n",
").to(device)\n",
"\n",
"out = mha_ch03(embeddings)\n",
"print(out.shape)"
]
},
{
"cell_type": "markdown",
"id": "73cd11da-ea3b-4081-b483-c4965dfefbc4",
"metadata": {
"id": "73cd11da-ea3b-4081-b483-c4965dfefbc4"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## 3) An alternative multi-head attention with combined weights"
]
},
{
"cell_type": "markdown",
"id": "1fa1a5ea-eaff-4d2d-aaf0-b34cdb6fd4dd",
"metadata": {
"id": "1fa1a5ea-eaff-4d2d-aaf0-b34cdb6fd4dd"
},
"source": [
"- The code for the `MultiHeadAttentionAlt` class below is based on code that was kindly shared by [Rayed Bin Wahed](https://github.com/rasbt/LLMs-from-scratch/discussions/51)\n",
"- The main difference between the `MultiHeadAttentionAlt` class and the `MultiHeadAttention` class used in chapter 3 is that `MultiHeadAttentionAlt` uses a single weight matrix, `self.qkv = nn.Linear(d_in, 3 * d_out, bias=qkv_bias)` instead of separate weight matrices:\n",
"\n",
" - `self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)`\n",
" - `self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)`\n",
" - `self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)`\n",
"\n",
"- Here, `self.qkv` combines all three weight matrices `self.W_query`, `self.W_key`, and `self.W_value` to carry out the query, key, and value computation in a single step\n",
"- Using `q, k, v = qkv.unbind(0)`, we obtain the individual query, key, and value tensors, which are then used similarly to the query, key, and value tensors in the `MultiHeadAttention` class in chapter 3"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9a6bd0a2-f27c-4602-afa0-c96cd295c1a6",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9a6bd0a2-f27c-4602-afa0-c96cd295c1a6",
"outputId": "5d948671-176f-4633-bede-97767e36becc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([8, 1024, 768])\n"
]
}
],
"source": [
"import torch.nn as nn\n",
"\n",
"\n",
"class MultiHeadAttentionCombinedQKV(nn.Module):\n",
2024-04-04 07:27:41 -05:00
" def __init__(self, d_in, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False):\n",
2024-03-13 08:37:54 -05:00
" super().__init__()\n",
"\n",
" assert d_out % num_heads == 0, \"embed_dim is indivisible by num_heads\"\n",
"\n",
" self.num_heads = num_heads\n",
2024-04-04 07:27:41 -05:00
" self.context_length = context_length\n",
2024-03-13 08:37:54 -05:00
" self.head_dim = d_out // num_heads\n",
"\n",
" self.qkv = nn.Linear(d_in, 3 * d_out, bias=qkv_bias)\n",
2024-05-27 07:46:29 -05:00
" self.proj = nn.Linear(d_out, d_out)\n",
2024-03-13 08:37:54 -05:00
" self.dropout = nn.Dropout(dropout)\n",
"\n",
" self.register_buffer(\n",
2024-04-04 07:27:41 -05:00
" \"mask\", torch.triu(torch.ones(context_length, context_length), diagonal=1)\n",
2024-03-13 08:37:54 -05:00
" )\n",
"\n",
" def forward(self, x):\n",
" batch_size, num_tokens, embed_dim = x.shape\n",
"\n",
" # (b, num_tokens, embed_dim) --> (b, num_tokens, 3 * embed_dim)\n",
" qkv = self.qkv(x)\n",
"\n",
" # (b, num_tokens, 3 * embed_dim) --> (b, num_tokens, 3, num_heads, head_dim)\n",
2024-04-26 17:13:08 -05:00
" qkv = qkv.view(batch_size, num_tokens, 3, self.num_heads, self.head_dim)\n",
2024-03-13 08:37:54 -05:00
"\n",
" # (b, num_tokens, 3, num_heads, head_dim) --> (3, b, num_heads, num_tokens, head_dim)\n",
" qkv = qkv.permute(2, 0, 3, 1, 4)\n",
"\n",
" # (3, b, num_heads, num_tokens, head_dim) -> 3 times (b, num_head, num_tokens, head_dim)\n",
" queries, keys, values = qkv.unbind(0)\n",
"\n",
" # (b, num_heads, num_tokens, head_dim) --> (b, num_heads, num_tokens, num_tokens)\n",
" attn_scores = queries @ keys.transpose(-2, -1)\n",
" attn_scores = attn_scores.masked_fill(\n",
" self.mask.bool()[:num_tokens, :num_tokens], -torch.inf\n",
" )\n",
"\n",
" attn_weights = torch.softmax(attn_scores / keys.shape[-1]**-0.5, dim=-1)\n",
" attn_weights = self.dropout(attn_weights)\n",
"\n",
" # (b, num_heads, num_tokens, num_tokens) --> (b, num_heads, num_tokens, head_dim)\n",
" context_vec = attn_weights @ values\n",
"\n",
" # (b, num_heads, num_tokens, head_dim) --> (b, num_tokens, num_heads, head_dim)\n",
" context_vec = context_vec.transpose(1, 2)\n",
"\n",
" # (b, num_tokens, num_heads, head_dim) --> (b, num_tokens, embed_dim)\n",
2024-04-26 17:13:08 -05:00
" context_vec = context_vec.contiguous().view(batch_size, num_tokens, embed_dim)\n",
2024-03-13 08:37:54 -05:00
"\n",
" context_vec = self.proj(context_vec)\n",
"\n",
" return context_vec\n",
"\n",
"\n",
"mha_combined_qkv = MultiHeadAttentionCombinedQKV(\n",
" d_in=embed_dim,\n",
" d_out=embed_dim,\n",
2024-04-04 07:27:41 -05:00
" context_length=context_len,\n",
2024-03-13 08:37:54 -05:00
" dropout=0.0,\n",
" num_heads=12,\n",
" qkv_bias=False\n",
").to(device)\n",
"\n",
"out = mha_combined_qkv(embeddings)\n",
"print(out.shape)"
]
},
{
"cell_type": "markdown",
"id": "48a042d3-ee78-4c29-bf63-d92fe6706632",
"metadata": {
"id": "48a042d3-ee78-4c29-bf63-d92fe6706632"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## 4) Multihead attention with PyTorch's scaled dot product attention"
]
},
{
"cell_type": "markdown",
"id": "f78e346f-3b85-44e6-9feb-f01131381148",
"metadata": {
"id": "f78e346f-3b85-44e6-9feb-f01131381148"
},
"source": [
"- The implementation below uses PyTorch's [`scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) function, which implements a memory-optimized version of self-attention calld [flash attention](https://arxiv.org/abs/2205.14135)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1b8e5a0d-1f65-4a03-bf6e-723f0cc428f5",
"metadata": {
"id": "1b8e5a0d-1f65-4a03-bf6e-723f0cc428f5"
},
"outputs": [],
"source": [
"class MHAPyTorchScaledDotProduct(nn.Module):\n",
2024-04-04 07:27:41 -05:00
" def __init__(self, d_in, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False):\n",
2024-03-13 08:37:54 -05:00
" super().__init__()\n",
"\n",
" assert d_out % num_heads == 0, \"embed_dim is indivisible by num_heads\"\n",
"\n",
" self.num_heads = num_heads\n",
2024-04-04 07:27:41 -05:00
" self.context_length = context_length\n",
2024-03-13 08:37:54 -05:00
" self.head_dim = d_out // num_heads\n",
" self.d_out = d_out\n",
"\n",
" self.qkv = nn.Linear(d_in, 3 * d_out, bias=qkv_bias)\n",
2024-05-26 15:38:35 -05:00
" self.proj = nn.Linear(d_out, d_out)\n",
2024-03-13 08:37:54 -05:00
" self.dropout = dropout\n",
"\n",
" def forward(self, x):\n",
" batch_size, num_tokens, embed_dim = x.shape\n",
"\n",
" # (b, num_tokens, embed_dim) --> (b, num_tokens, 3 * embed_dim)\n",
" qkv = self.qkv(x)\n",
"\n",
" # (b, num_tokens, 3 * embed_dim) --> (b, num_tokens, 3, num_heads, head_dim)\n",
2024-04-26 17:13:08 -05:00
" qkv = qkv.view(batch_size, num_tokens, 3, self.num_heads, self.head_dim)\n",
2024-03-13 08:37:54 -05:00
"\n",
" # (b, num_tokens, 3, num_heads, head_dim) --> (3, b, num_heads, num_tokens, head_dim)\n",
" qkv = qkv.permute(2, 0, 3, 1, 4)\n",
"\n",
" # (3, b, num_heads, num_tokens, head_dim) -> 3 times (b, num_heads, num_tokens, head_dim)\n",
2024-04-26 17:13:08 -05:00
" queries, keys, values = qkv\n",
2024-03-13 08:37:54 -05:00
"\n",
" use_dropout = 0. if not self.training else self.dropout\n",
" context_vec = nn.functional.scaled_dot_product_attention(\n",
" queries, keys, values, attn_mask=None, dropout_p=use_dropout, is_causal=True)\n",
"\n",
" # Combine heads, where self.d_out = self.num_heads * self.head_dim\n",
" context_vec = context_vec.transpose(1, 2).contiguous().view(batch_size, num_tokens, self.d_out)\n",
"\n",
2024-04-26 17:13:08 -05:00
" context_vec = self.proj(context_vec)\n",
"\n",
2024-03-13 08:37:54 -05:00
" return context_vec"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "fbc8ba92-3471-41cb-b1b2-4c0ef5be392b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fbc8ba92-3471-41cb-b1b2-4c0ef5be392b",
"outputId": "af9e4855-7f20-4d61-8532-4827df8dfb30"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([8, 1024, 768])\n"
]
}
],
"source": [
"mha_pytorch_scaled = MHAPyTorchScaledDotProduct(\n",
" d_in=embed_dim,\n",
" d_out=embed_dim,\n",
2024-04-04 07:27:41 -05:00
" context_length=context_len,\n",
2024-03-13 08:37:54 -05:00
" dropout=0.0,\n",
" num_heads=12,\n",
" qkv_bias=False\n",
").to(device)\n",
"\n",
"out = mha_pytorch_scaled(embeddings)\n",
"print(out.shape)"
]
},
{
"cell_type": "markdown",
"id": "351c318f-4835-4d74-8d58-a070222447c4",
"metadata": {
"id": "351c318f-4835-4d74-8d58-a070222447c4"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## 5) Using PyTorch's torch.nn.MultiheadAttention"
]
},
{
"cell_type": "markdown",
"id": "74a6d060-6324-48fa-a35c-cb09f2a48965",
"metadata": {
"id": "74a6d060-6324-48fa-a35c-cb09f2a48965"
},
"source": [
"- Below, we use PyTorch's [torch.nn.MultiheadAttention](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html) implementation"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3799c7ef-3155-42c6-a829-f95656453ae0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3799c7ef-3155-42c6-a829-f95656453ae0",
"outputId": "2a085df8-0445-4818-9978-6dc74469f568"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([8, 1024, 768])\n"
]
}
],
"source": [
"import torch.nn as nn\n",
"\n",
"\n",
"class MHAPyTorchClass(nn.Module):\n",
2024-04-04 07:27:41 -05:00
" def __init__(self, d_in, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False, need_weights=True):\n",
2024-03-13 08:37:54 -05:00
" super().__init__()\n",
"\n",
2024-04-04 07:27:41 -05:00
" self.context_length = context_length\n",
2024-03-13 08:37:54 -05:00
" self.multihead_attn = nn.MultiheadAttention(\n",
" embed_dim=d_out,\n",
" num_heads=num_heads,\n",
" dropout=dropout,\n",
" bias=qkv_bias,\n",
" add_bias_kv=qkv_bias,\n",
" batch_first=True,\n",
" )\n",
"\n",
" self.need_weights = need_weights\n",
" self.proj = nn.Linear(d_out, d_out)\n",
2024-04-04 07:27:41 -05:00
" self.register_buffer(\"mask\", torch.triu(torch.ones(context_length, context_length), diagonal=1).bool())\n",
2024-03-13 08:37:54 -05:00
"\n",
" def forward(self, x):\n",
" batch_size, num_tokens, _ = x.shape\n",
"\n",
" # Ensure attn_mask is compatible with expected shape and `batch_first=True`\n",
" # No need to manually adjust for num_heads; ensure it's right for the sequence\n",
2024-04-04 07:27:41 -05:00
" if self.context_length >= num_tokens:\n",
2024-03-13 08:37:54 -05:00
" attn_mask = self.mask[:num_tokens, :num_tokens]\n",
" else:\n",
2024-04-04 07:27:41 -05:00
" attn_mask = self.mask[:self.context_length, :self.context_length]\n",
2024-03-13 08:37:54 -05:00
"\n",
" # attn_mask broadcasting will handle batch_size dimension implicitly\n",
" attn_output, _ = self.multihead_attn(\n",
" x, x, x, attn_mask=attn_mask, need_weights=self.need_weights\n",
" )\n",
"\n",
" output = self.proj(attn_output)\n",
"\n",
" return output\n",
"\n",
"\n",
"mha_pytorch_class_default = MHAPyTorchClass(\n",
" d_in=embed_dim,\n",
" d_out=embed_dim,\n",
2024-04-04 07:27:41 -05:00
" context_length=context_len,\n",
2024-03-13 08:37:54 -05:00
" dropout=0.0,\n",
" num_heads=12,\n",
" qkv_bias=False\n",
").to(device)\n",
"\n",
"out = mha_pytorch_class_default(embeddings)\n",
"print(out.shape)"
]
},
{
"cell_type": "markdown",
"id": "a3953bff-1056-4de2-bfd1-dfccf659eee4",
"metadata": {
"id": "a3953bff-1056-4de2-bfd1-dfccf659eee4"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## 6) Using PyTorch's torch.nn.MultiheadAttention with `scaled_dot_product_attention`"
]
},
{
"cell_type": "markdown",
"id": "d2164859-31a0-4537-b4fb-27d57675ba77",
"metadata": {
"id": "d2164859-31a0-4537-b4fb-27d57675ba77"
},
"source": [
"- Set `need_weights` (default `True`) to need_weights=False so that MultiheadAttention uses `scaled_dot_product_attention` [according to the documentation](https://github.com/pytorch/pytorch/blob/71d020262793542974cf13b30f2a9099773f015c/torch/nn/modules/activation.py#L1096)\n",
"\n",
"> need_weights: If specified, returns ``attn_output_weights`` in addition to ``attn_outputs``.\n",
" Set ``need_weights=False`` to use the optimized ``scaled_dot_product_attention``\n",
" and achieve the best performance for MHA.\n",
" Default: ``True``."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4a4c2afe-5e1f-4bd7-a118-67031176f147",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4a4c2afe-5e1f-4bd7-a118-67031176f147",
"outputId": "234771f4-8a53-4478-8a9b-cf19f79a5e07"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([8, 1024, 768])\n"
]
}
],
"source": [
"mha_pytorch_class_noweights = MHAPyTorchClass(\n",
" d_in=embed_dim,\n",
" d_out=embed_dim,\n",
2024-04-04 07:27:41 -05:00
" context_length=context_len,\n",
2024-03-13 08:37:54 -05:00
" dropout=0.0,\n",
" num_heads=12,\n",
" qkv_bias=False,\n",
" need_weights=False # NEW!\n",
").to(device)\n",
"\n",
"out = mha_pytorch_class_noweights(embeddings)\n",
"print(out.shape)"
]
},
{
"cell_type": "markdown",
"id": "8877de71-f84f-4f6d-bc87-7552013b6301",
"metadata": {
"id": "8877de71-f84f-4f6d-bc87-7552013b6301"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## Quick speed comparison (M3 Macbook Air CPU)"
]
},
{
"cell_type": "code",
2024-04-26 17:13:08 -05:00
"execution_count": 9,
2024-03-13 08:37:54 -05:00
"id": "a97c0b2e-6593-49d8-98bc-2267b3aa610f",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "a97c0b2e-6593-49d8-98bc-2267b3aa610f",
"outputId": "ebe635b2-5c03-4e9b-da3a-951d308acf7b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"191 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 1) CausalAttention MHA wrapper class from chapter 3\n",
"%timeit mha_ch03_wrapper(embeddings)"
]
},
{
"cell_type": "code",
2024-04-26 17:13:08 -05:00
"execution_count": 10,
2024-03-13 08:37:54 -05:00
"id": "19db9c2c-8e75-431a-8eef-0b4d8284e6e6",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "19db9c2c-8e75-431a-8eef-0b4d8284e6e6",
"outputId": "c6e7bcff-661c-45a6-da82-b1e3f89cf761"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"186 ms ± 2.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 2) The multi-head attention class from chapter 3\n",
"%timeit mha_ch03(embeddings)"
]
},
{
"cell_type": "code",
2024-04-26 17:13:08 -05:00
"execution_count": 11,
2024-03-13 08:37:54 -05:00
"id": "aa526ee0-7a88-4f34-a49a-f8f97da83779",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aa526ee0-7a88-4f34-a49a-f8f97da83779",
"outputId": "92b634f8-43f8-468f-87a1-bb774b64c212"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"207 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 3) An alternative multi-head attention with combined weights\n",
"%timeit mha_combined_qkv(embeddings)"
]
},
{
"cell_type": "code",
2024-04-26 17:13:08 -05:00
"execution_count": 12,
2024-03-13 08:37:54 -05:00
"id": "cc2b4256-16d8-4c34-9fd0-d4b4af0e60fa",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cc2b4256-16d8-4c34-9fd0-d4b4af0e60fa",
"outputId": "80c6e314-0771-470e-b090-628984ce2d85"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"73.3 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 4) Multihead attention with PyTorch's scaled dot product attention\n",
"%timeit mha_pytorch_scaled(embeddings)"
]
},
{
"cell_type": "code",
2024-04-26 17:13:08 -05:00
"execution_count": 13,
2024-03-13 08:37:54 -05:00
"id": "0f209e70-ebb6-4a1a-b608-1ff42e41c01d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0f209e70-ebb6-4a1a-b608-1ff42e41c01d",
"outputId": "3cd37b53-04d4-4dd0-9450-6fc8ebaac083"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"210 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 5) Using PyTorch's torch.nn.MultiheadAttention\n",
"%timeit mha_pytorch_class_default(embeddings)"
]
},
{
"cell_type": "code",
2024-04-26 17:13:08 -05:00
"execution_count": 14,
2024-03-13 08:37:54 -05:00
"id": "3f4968c2-8d40-4ab9-8dba-052b4f77d756",
"metadata": {
"id": "3f4968c2-8d40-4ab9-8dba-052b4f77d756",
"outputId": "2e86bdb4-7fa0-4051-b000-4a2b591060a2",
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"199 ms ± 6.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 6) Using PyTorch's torch.nn.MultiheadAttention disabling `need_weights`\n",
"%timeit mha_pytorch_class_noweights(embeddings)"
]
},
{
"cell_type": "markdown",
"id": "a78ff594-6cc2-496d-a302-789fa104c3c9",
"metadata": {
"id": "a78ff594-6cc2-496d-a302-789fa104c3c9"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
2024-03-13 08:37:54 -05:00
"## Quick speed comparison (Nvidia A100 GPU)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "707a2a14-a089-48a8-88aa-d328e1e0a9d0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "707a2a14-a089-48a8-88aa-d328e1e0a9d0",
"outputId": "e99a17e9-8139-4b04-dac8-fa1dd5027735"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"8 ms ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 1) CausalAttention MHA wrapper class from chapter 3\n",
"%timeit mha_ch03_wrapper(embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8686dd69-3655-40e4-a57b-a2c55532a010",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8686dd69-3655-40e4-a57b-a2c55532a010",
"outputId": "5553b42c-b709-41a4-8a8b-be36dae408ab"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"6.22 ms ± 490 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 2) The multi-head attention class from chapter 3\n",
"%timeit mha_ch03(embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "2209d7df-e54b-4910-ae2b-c78cf684d9bf",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2209d7df-e54b-4910-ae2b-c78cf684d9bf",
"outputId": "01b0da88-510b-4b21-919a-0a7519a55ed8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"6.85 ms ± 824 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 3) An alternative multi-head attention with combined weights\n",
"%timeit mha_combined_qkv(embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1075abe2-4839-4fd6-af3e-c09bb3651e26",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1075abe2-4839-4fd6-af3e-c09bb3651e26",
"outputId": "542706db-5041-45ca-f667-9e1bd1c2c7aa"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"2.95 ms ± 336 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
2024-03-13 08:37:54 -05:00
]
2024-03-09 10:09:17 -06:00
}
2024-03-13 08:37:54 -05:00
],
"source": [
"## 4) Multihead attention with PyTorch's scaled dot product attention\n",
"%timeit mha_pytorch_scaled(embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "868e3670-8edc-47bc-9e06-eb505e44dc9d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "868e3670-8edc-47bc-9e06-eb505e44dc9d",
"outputId": "13cfc808-2b11-4041-fe67-e5a63abe4f28"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"6.39 ms ± 672 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 5) Using PyTorch's torch.nn.MultiheadAttention\n",
"%timeit mha_pytorch_class_default(embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "944870e6-de54-4e3b-a455-b8f21f6f92c8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "944870e6-de54-4e3b-a455-b8f21f6f92c8",
"outputId": "c52858e7-999c-4782-adc9-731f8d69dfa6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-04-26 17:13:08 -05:00
"4.49 ms ± 3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
2024-03-13 08:37:54 -05:00
]
}
],
"source": [
"## 6) Using PyTorch's torch.nn.MultiheadAttention disabling `need_weights`\n",
"%timeit mha_pytorch_class_noweights(embeddings)"
]
},
{
"cell_type": "markdown",
"id": "dabc6575-0316-4640-a729-e616d5c17b73",
"metadata": {
"id": "dabc6575-0316-4640-a729-e616d5c17b73"
},
"source": [
2024-03-23 07:27:43 -05:00
"<br>\n",
" \n",
"\n",
"\n",
2024-03-13 08:37:54 -05:00
"## Speed comparison (Nvidia A100 GPU) with warmup"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "29b63d3d-6d0b-43bb-9c68-d5514dc81000",
"metadata": {
"id": "29b63d3d-6d0b-43bb-9c68-d5514dc81000"
},
"outputs": [],
"source": [
"# CUDA benchmark code shared by Andrei Aksionov\n",
"# and based on code from\n",
"# https://github.com/cuda-mode/lectures/blob/main/lecture1/pytorch_square.py\n",
"\n",
"def time_pytorch_function(func, *input, num_repeats = 1_000):\n",
" # CUDA IS ASYNC so can't use python time module\n",
" start = torch.cuda.Event(enable_timing=True)\n",
" end = torch.cuda.Event(enable_timing=True)\n",
"\n",
" # Warmup\n",
" for _ in range(5):\n",
" func(*input)\n",
" torch.cuda.synchronize()\n",
"\n",
" start.record()\n",
" for _ in range(num_repeats):\n",
" func(*input)\n",
" torch.cuda.synchronize()\n",
" end.record()\n",
" torch.cuda.synchronize()\n",
" return start.elapsed_time(end) / num_repeats"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "CDJAPZaszaqx",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 489
},
"id": "CDJAPZaszaqx",
"outputId": "f23e9b83-7fd6-4011-9434-0e6934cf762a"
},
"outputs": [
{
"data": {
2024-04-26 17:13:08 -05:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAnAAAAHWCAYAAAD3vrTNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAADNB0lEQVR4nOzddVhU6dvA8e/QIiglCuYqdrt2d6MgtghYYIKo2FjYKIKiYucqdq7dunYnNio2JopIzfuHL+fniL3qgHt/rstrnTPnHO959swz93nOEyoLCws1QgghhBAi1dDRdgBCCCGEEOLbSAInhBBCCJHKSAInhBBCCJHKSAInhBBCCJHKSAInhBBCCJHKSAInhBBCCJHKSAInhBBCCJHKSAInhBBCCJHK6Gk7gJTGxsaGV69eaTsMIYQQQvxHmZiYcP/+/c/uIwnce2xsbDh//ry2wxBCCCHEf1yhQoU+m8RJAveepJa3QoUKSSucEEIIIX45ExMTzp8//8U8RBK4j3j16hVRUVHaDkMIIYQQ4qNkEIMQQgghRCojCZwQQgghRCojCZwQQgghRCojCZwQQgghRCojCZwQQgghRCojCVwqp6Ojw4ABAzh58iQREREcP36c3r17f/aYjBkzMmPGDI4cOcLjx48ZNWrUR/fz8PDgyJEjREREcPbsWUaOHImhoeHP+BhCCCGE+AYyjUgq5+XlRbt27ejWrRthYWEUK1aM4OBgoqKimDlz5kePMTAw4MmTJwQEBNC5c+eP7uPk5MSQIUPw9PTk6NGj5MqVi6lTp6JWq/H19f2ZH0kIIYQQXyAJXCpXqlQpNm/ezPbt2wG4c+cOTk5OlChR4pPH3Llzh4EDBwLQunXrj+5TunRpjh49yqpVq5RjVq1axZ9//vmDP4EQQgghvpU8Qk3ljh07RuXKlcmVKxcABQsWpEyZMuzYseNfnffo0aMULVpUSQSzZ89OrVq1/vV5hRBCCPHvSQtcKhcYGIipqSmHDx8mISEBXV1dRo0axcqVK//VeVetWoWlpSV///03KpUKfX195s2bx6RJk35Q5EIIIYT4XtICl8o5ODjQtGlT3N3dqVatGt26daNbt260bNnyX523QoUK9OzZEx8fH6pVq4aLiwu1atX64gAJIYQQQvx80gKXyg0fPpygoCDWrFkDwKVLl8iaNSs9e/YkNDT0u887YMAAli9fzuLFi5XzGhsbExAQQEBAAGq1+ofEL4QQQohvJy1wqVyaNGlITEzU2JaQkIBKpfrX5/0wSUtISAD41+cWQgghxL8jLXCp3NatW+nVqxcRERGEhYVRpEgRunTpwpIlS5R9fH19sbGxoWvXrsq2QoUKAWBiYoKlpSWFChUiLi6Oy5cvK+ft2rUrZ8+e5cSJE+TMmZMBAwawdevWZAmjEEIIIX4tSeBSuf79+zNgwAD8/f2xsrLiwYMHLFiwAH9/f2WfjBkzkjlzZo3j9u7dq/y9WLFiNGvWjNu3b1O8eHEAJk6ciFqtZuDAgdjY2PDkyRO2bt3KyJEjf80HE0IIIcQnqSwsLKQz0/8zNTUlPDycHDlyEBUVpe1whBBCCPEf87W5iPSBE0IIIYRIZSSBE0IIIYRIZSSBE0IIIYRIZSSBE0IIIYRIZbQ+CjVbtmyUK1eOLFmyYGxsTGRkJOfOnePYsWO8fftW2+EJIYQQQqQ4WkvgmjZtioeHB8WKFePRo0c8ePCAmJgYzM3NyZEjB2/fvmXlypUEBQURERGhrTCFEEIIIVIcrSRwu3fvJi4ujqVLl+Lq6sq9e/c03jcwMKBUqVI4Ojqyc+dOfHx8WL9+vTZCFUIIIYRIcbQyD1y1atXYvXv3V+1rbm5OtmzZOHPmzE+OSuaBE0IIIYR2fW0uorUWuK/17Nkznj179hOjEUIIIYRIXbQ+CrVIkSLkz59feV2vXj0WLVrE4MGD0dfX12JkQgghhBApk9YTuICAAOzs7ADInj07s2bNIjo6mkaNGjFs2LCvPo+Ojg4DBgzg5MmTREREcPz4cXr37v2TohZCCCGE0B6tTyOSK1cuzp07B0Djxo05dOgQHh4elC5dmtmzZzNo0KCvOo+Xlxft2rWjW7duhIWFUaxYMYKDg4mKimLmzJk/8yN8M9NOC7UdQooSNctF2yGIX8jGxoahQ4dSo0YN0qRJw82bN+nRowenT5/+5DEdOnSgY8eOZM2albt37xIQEMCyZcuU99etW0fFihWTHbdt2zZatWr1Mz6GEEJoldYTOJVKhY7Ou4bAKlWqsHXrVgDu3r2LhYXFV5+nVKlSbN68me3btwNw584dnJycKFGixI8PWgjxXdKnT8+mTZs4cOAALVq0IDIykpw5c/L8+fNPHtOuXTt8fX3p2bMnp06dokSJEgQGBvL8+XOlvnB1dcXAwEA5xtzcnH379snodSHEb0vrCdzp06fp3bs3e/fupXz58vTp0wd49zj18ePHX32eY8eO4eLiQq5cubh+/ToFCxakTJky+Pr6/qzQhRDfyMvLi7t379KjRw9l2+3btz97TPPmzZk/fz5r164F4NatWxQvXhxPT08lgfswAXR0dOTNmzesW7fuh8YvhBAphdYTuIEDBzJjxgzq169PQEAAN2/eBKBRo0YcPXr0q88TGBiIqakphw8fJiEhAV1dXUaNGsXKlSs/eYyBgQGGhobKaxMTk+//IEKIL6pbty67du1i7ty5lC9fnvv37zN37lwWLVr0yWMMDAySrcoSExNDiRIl0NPTIz4+Ptkxzs7OrF69mujo6B/+GYQQIiXQegJ38eJFKlWqlGz70KFDSUhI+OrzODg40LRpU9zd3QkLC6Nw4cKMGjWKBw8eEBoa+tFjevbsSb9+/b47diHEt8mePTvt2rVj+vTpTJo0ieLFizNmzBji4uI++T3dvXs3zs7ObNq0iTNnzlCsWDGcnZ0xMDDA0tKShw8fauxfokQJChQogJeX16/4SEIIoRVaT+DelzZtWqU/XJKvnVB3+PDhBAUFsWbNGgAuXbpE1qxZ6dmz5yd/GAIDA5k+fbry2sTEhPPnz39n9EKIL9HR0eH06dOMHDkSgHPnzpE/f37c3Nw++T2dMGEC1tbWbN26FZVKxePHj1m2bBmenp4kJiYm279NmzZcuHCBkydP/tTPIoQQ2qT1aUSyZcvG0qVLuX37Njdv3uT69etcv36dGzducP369a8+T5o0aZJV5gkJCahUqk8eExsbS1RUlPLn1atX3/05hBBf9vDhQy5fvqyx7cqVK2TJkuWTx8TExODp6UmWLFkoXrw4RYoU4fbt20RFRREZGamxr7GxMU2aNGHx4sU/JX4hhEgptN4CFxISgkqlwtPTk8ePH6NWf9/KXlu3bqVXr15EREQQFhZGkSJF6NKlC0uWLPnBEQshvteRI0eUeR+T5MqVizt37nzx2Pj4eGXdZEdHR7Zu3ZqsvmjcuDEGBgasWLHixwUt/rO+Z8obAwMDfHx8aNasGdbW1jx8+BB/f3/lt6hVq1YEBwdrHBMTE0PmzJl/5kcRvyGtJ3AFCxakRo0aXLt27V+dp3///gwYMAB/f3+srKx48OABCxYswN/f/wdFKoT4t0JCQti8eTPe3t6sXbuWEiVK4OLiQq9evZR9fH19sbGxoWvXrsC7BK9EiRKcOHECMzMzunTpQv78+enWrVuy87dp04ZNmzbJ8nviX/ueKW8A5s6dS4YMGfDy8uLGjRtkzJgxWdegly9fUqZMGeX19zZciP82rSdwp06dInPmzP86gXv16hWDBg366ol/hRC/3qlTp3BxccHX15c+ffpw+/ZtBg0apDFaPGPGjBqtEbq6unTt2hU7Ozvi4+M5cOAA9erVS9ZqZ2dnR7ly5XBycvpln0f8vr5nypvq1atTvnx5SpQooSR6H2tdVqvVPHr06IfGK/57VBYWFlpN/XPkyMHEiRNZsWIFly5dIi4uTuP9ixcv/rJYTE1NCQ8PJ0eOHF89eOK7/h1ZiUGDrMQghEhpDh48yK5du7C1tf3qKW/8/f3JlSsXp0+fpnnz5rx+/ZotW7YwZswYYmJigHePUAMDA7l//z4
2024-03-13 08:37:54 -05:00
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"embeddings_cuda = embeddings.to(torch.device(\"cuda\"))\n",
"\n",
"functions = {\n",
" \"1) MHA wrapper class\": mha_ch03_wrapper,\n",
" \"2) MHA Ch03\": mha_ch03,\n",
" \"3) MHA with combined QKV weights\": mha_combined_qkv,\n",
" \"4) MHA with PyTorch scaled_dot_product_attention\": mha_pytorch_scaled,\n",
" \"5) PyTorch MHA class defaults\": mha_pytorch_class_default,\n",
" \"6) PyTorch MHA with need_weights=False\": mha_pytorch_class_noweights\n",
"}\n",
"execution_times = [time_pytorch_function(fn, embeddings_cuda) for name,fn in functions.items()]\n",
"\n",
"\n",
"# Plotting\n",
"\n",
"# Customize further for dark mode aesthetics\n",
"plt.rcParams['figure.facecolor'] = '#121212' # Dark figure background\n",
"plt.rcParams['axes.facecolor'] = '#121212' # Dark axes background\n",
"plt.rcParams['axes.edgecolor'] = 'white' # White axes border\n",
"plt.rcParams['axes.labelcolor'] = 'white' # White labels\n",
"plt.rcParams['text.color'] = 'white' # White text\n",
"plt.rcParams['xtick.color'] = 'white' # White x ticks\n",
"plt.rcParams['ytick.color'] = 'white' # White y ticks\n",
"plt.rcParams['grid.color'] = '#444444' # Lighter grid lines for contrast\n",
"plt.rcParams['lines.linewidth'] = 2 # Thicker plot lines for visibility\n",
"plt.rcParams['lines.markersize'] = 8 # Larger markers for visibility\n",
"\n",
"fig, ax = plt.subplots()\n",
"bars = plt.bar(functions.keys(), execution_times)\n",
"\n",
"plt.ylabel('Execution time (ms)')\n",
"plt.xticks(rotation=45, ha=\"right\")\n",
"\n",
"# Calculate new ylim with a margin\n",
"max_execution_time = max(execution_times)\n",
"upper_ylim = max_execution_time + 0.2 * max_execution_time # Adding a 20% margin\n",
"\n",
"plt.ylim(0, upper_ylim) # Setting new ylim\n",
"\n",
"# Annotate bars with execution times\n",
"for bar in bars:\n",
" yval = bar.get_height()\n",
" plt.text(bar.get_x() + bar.get_width()/2, yval + (0.05 * upper_ylim), round(yval, 2), ha='center', va='bottom')\n",
"\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig(\"1.pdf\")\n",
"plt.show()\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "A100",
"machine_shape": "hm",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
2024-03-06 08:30:32 -06:00
},
2024-03-13 08:37:54 -05:00
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2024-04-26 17:13:08 -05:00
"version": "3.11.4"
2024-03-13 08:37:54 -05:00
}
},
"nbformat": 4,
"nbformat_minor": 5
}