LLMs-from-scratch/ch06/01_main-chapter-code/exercise-solutions.ipynb

169 lines
4.9 KiB
Plaintext
Raw Normal View History

2024-05-13 07:45:59 -05:00
{
"cells": [
{
"cell_type": "markdown",
"id": "ba450fb1-8a26-4894-ab7a-5d7bfefe90ce",
"metadata": {},
"source": [
"<font size=\"1\">\n",
"Supplementary code for \"Build a Large Language Model From Scratch\": <a href=\"https://www.manning.com/books/build-a-large-language-model-from-scratch\">https://www.manning.com/books/build-a-large-language-model-from-scratch</a> by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
"Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
"</font>"
]
},
{
"cell_type": "markdown",
"id": "51c9672d-8d0c-470d-ac2d-1271f8ec3f14",
"metadata": {},
"source": [
"# Chapter 6 Exercise solutions"
]
},
{
"cell_type": "markdown",
"id": "5fea8be3-30a1-4623-a6d7-b095c6c1092e",
"metadata": {},
"source": [
"## Exercise 6.1: Increasing the context length"
]
},
{
"cell_type": "markdown",
"id": "5860ba9f-2db3-4480-b96b-4be1c68981eb",
"metadata": {},
"source": [
2024-05-13 20:06:38 -05:00
"We can pad the inputs to the maximum number of tokens the model supports by setting the max length to 1024:\n",
2024-05-13 07:45:59 -05:00
"\n",
"```python\n",
"max_length = 1024\n",
"\n",
"train_dataset = SpamDataset(base_path / \"train.csv\", max_length=max_length, tokenizer=tokenizer)\n",
"val_dataset = SpamDataset(base_path / \"validation.csv\", max_length=max_length, tokenizer=tokenizer)\n",
"test_dataset = SpamDataset(base_path / \"test.csv\", max_length=max_length, tokenizer=tokenizer)\n",
"\n",
"```\n",
"\n",
"or, equivalently, we can define the `max_length` via:\n",
"\n",
"```python\n",
"max_length = model.pos_emb.weight.shape[0]\n",
"```\n",
"\n",
"or\n",
"\n",
"```python\n",
"max_length = BASE_CONFIG[\"context_length\"]\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "2b0f4d5d-17fd-4265-93d8-ea08a22fdaf8",
"metadata": {},
"source": [
"For convenience, you can run this experiment via\n",
"\n",
"```\n",
"python additional-experiments.py --context_length \"model_context_length\"\n",
"```\n",
"\n",
"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter)."
]
},
{
"cell_type": "markdown",
"id": "5a780455-f52a-48d1-ab82-6afd40bcad8b",
"metadata": {},
"source": [
"## Exercise 6.2: Finetuning the whole model"
]
},
{
"cell_type": "markdown",
"id": "56aa5208-aa29-4165-a0ec-7480754e2a18",
"metadata": {},
"source": [
"Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:\n",
"\n",
"```python\n",
"for param in model.parameters():\n",
" param.requires_grad = False\n",
"```\n",
"\n",
"For convenience, you can run this experiment via\n",
"\n",
"```\n",
"python additional-experiments.py --trainable_layers all\n",
"```\n",
"\n",
"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter)."
]
},
{
"cell_type": "markdown",
"id": "2269bce3-f2b5-4a76-a692-5977c75a57b6",
"metadata": {},
"source": [
"## Exercise 6.3: Finetuning the first versus last token "
]
},
{
"cell_type": "markdown",
"id": "7418a629-51b6-4aa2-83b7-bc0261bc370f",
"metadata": {},
"source": [
2024-05-13 20:06:38 -05:00
"Rather than finetuning the last output token, we can finetune the first output token by changing \n",
2024-05-13 07:45:59 -05:00
"\n",
"```python\n",
"model(input_batch)[:, -1, :]\n",
"```\n",
"\n",
"to\n",
"\n",
"```python\n",
"model(input_batch)[:, 0, :]\n",
"```\n",
"\n",
"everywhere in the code.\n",
"\n",
"For convenience, you can run this experiment via\n",
"\n",
"```\n",
"python additional-experiments.py --trainable_token first\n",
"```\n",
"\n",
"using the code in the [../02_bonus_additional-experiments](../02_bonus_additional-experiments) folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5e6188a-f182-4f26-b9e5-ccae3ecadae0",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}