This commit is contained in:
rasbt 2024-05-10 07:02:14 -05:00
parent d8de9377de
commit 774974de97

View File

@ -1440,6 +1440,15 @@
"print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)" "print(\"Outputs dimensions:\", outputs.shape) # shape: (batch_size, num_tokens, num_classes)"
] ]
}, },
{
"cell_type": "markdown",
"id": "75430a01-ef9c-426a-aca0-664689c4f461",
"metadata": {},
"source": [
"- As discussed in previous chapters, for each input token, there's one output vector\n",
"- Since we fed the model a text sample with 4 input tokens, the output consists of 4 2-dimensional output vectors above"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b", "id": "7df9144f-6817-4be4-8d4b-5d4dadfe4a9b",
@ -1453,11 +1462,9 @@
"id": "e3bb8616-c791-4f5c-bac0-5302f663e46a", "id": "e3bb8616-c791-4f5c-bac0-5302f663e46a",
"metadata": {}, "metadata": {},
"source": [ "source": [
"- As discussed in previous chapters, for each input token, there's one output vector\n",
"- Since we fed the model a text sample with 6 input tokens, the output consists of 6 2-dimensional output vectors above\n",
"- In chapter 3, we discussed the attention mechanism, which connects each input token to each other input token\n", "- In chapter 3, we discussed the attention mechanism, which connects each input token to each other input token\n",
"- In chapter 3, we then also introduced the causal attention mask that is used in GPT-like models; this causal mask lets a current token only attend to the current and previous token positions\n", "- In chapter 3, we then also introduced the causal attention mask that is used in GPT-like models; this causal mask lets a current token only attend to the current and previous token positions\n",
"- Based on this causal attention mechanism, the 6th (last) token above contains the most information among all tokens because it's the only token that includes information about all other tokens\n", "- Based on this causal attention mechanism, the 4th (last) token contains the most information among all tokens because it's the only token that includes information about all other tokens\n",
"- Hence, we are particularly interested in this last token, which we will finetune for the spam classification task" "- Hence, we are particularly interested in this last token, which we will finetune for the spam classification task"
] ]
}, },
@ -2265,7 +2272,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.6" "version": "3.10.12"
} }
}, },
"nbformat": 4, "nbformat": 4,