Fixed some typos in ch06.ipynb (#219)

This commit is contained in:
Jinge Wang 2024-06-18 18:54:01 +08:00 committed by GitHub
parent c8c0fd4fb5
commit 8e2c8d0987

View File

@ -123,7 +123,7 @@
"source": [ "source": [
"- Classification finetuning, the topic of this chapter, is a procedure you may already be familiar with if you have a background in machine learning -- it's similar to training a convolutional network to classify handwritten digits, for example\n", "- Classification finetuning, the topic of this chapter, is a procedure you may already be familiar with if you have a background in machine learning -- it's similar to training a convolutional network to classify handwritten digits, for example\n",
"- In classification finetuning, we have a specific number of class labels (for example, \"spam\" and \"not spam\") that the model can output\n", "- In classification finetuning, we have a specific number of class labels (for example, \"spam\" and \"not spam\") that the model can output\n",
"- A classification finetuned model can only predict classes it has seen during training (for example, \"spam\" or \"not spam\", whereas an instruction-finetuned model can usually perform many tasks\n", "- A classification finetuned model can only predict classes it has seen during training (for example, \"spam\" or \"not spam\"), whereas an instruction-finetuned model can usually perform many tasks\n",
"- We can think of a classification-finetuned model as a very specialized model; in practice, it is much easier to create a specialized model than a generalist model that performs well on many different tasks" "- We can think of a classification-finetuned model as a very specialized model; in practice, it is much easier to create a specialized model than a generalist model that performs well on many different tasks"
] ]
}, },
@ -474,7 +474,7 @@
"id": "5715e685-35b4-4b45-a86c-8a8694de9d6f" "id": "5715e685-35b4-4b45-a86c-8a8694de9d6f"
}, },
"source": [ "source": [
"- Let's now define a function that randomly divides the dataset into a training, validation, and test subset" "- Let's now define a function that randomly divides the dataset into training, validation, and test subsets"
] ]
}, },
{ {
@ -525,8 +525,8 @@
}, },
"source": [ "source": [
"- Note that the text messages have different lengths; if we want to combine multiple training examples in a batch, we have to either\n", "- Note that the text messages have different lengths; if we want to combine multiple training examples in a batch, we have to either\n",
" - 1. truncate all messages to the length of the shortest message in the dataset or batch\n", " 1. truncate all messages to the length of the shortest message in the dataset or batch\n",
" - 2. pad all messages to the length of the longest message in the dataset or batch\n", " 2. pad all messages to the length of the longest message in the dataset or batch\n",
"\n", "\n",
"- We choose option 2 and pad all messages to the longest message in the dataset\n", "- We choose option 2 and pad all messages to the longest message in the dataset\n",
"- For that, we use `<|endoftext|>` as a padding token, as discussed in chapter 2" "- For that, we use `<|endoftext|>` as a padding token, as discussed in chapter 2"
@ -922,7 +922,7 @@
"id": "ab8e056c-abe0-415f-b34d-df686204259e", "id": "ab8e056c-abe0-415f-b34d-df686204259e",
"metadata": {}, "metadata": {},
"source": [ "source": [
"- To ensure that the model was loaded corrected, let's double-check that it generates coherent text" "- To ensure that the model was loaded correctly, let's double-check that it generates coherent text"
] ]
}, },
{ {
@ -1350,7 +1350,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"- Technically, it's sufficient to only train the output layer\n", "- Technically, it's sufficient to only train the output layer\n",
"- However, as I found in [experiments finetuning additional layers](https://magazine.sebastianraschka.com/p/finetuning-large-language-models) can noticeably improve the performance\n", "- However, as I found in [Finetuning Large Language Models](https://magazine.sebastianraschka.com/p/finetuning-large-language-models), experiments show that finetuning additional layers can noticeably improve the performance\n",
"- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable" "- So, we are also making the last transformer block and the final `LayerNorm` module connecting the last transformer block to the output layer trainable"
] ]
}, },
@ -2147,7 +2147,7 @@
"id": "6882649f-dc7b-401f-84d2-024ff79c74a1", "id": "6882649f-dc7b-401f-84d2-024ff79c74a1",
"metadata": {}, "metadata": {},
"source": [ "source": [
"- We can see that the training and test set performances are practically identical\n", "- We can see that the training and validation set performances are practically identical\n",
"- However, based on the slightly lower test set performance, we can see that the model overfits the training data to a very small degree, as well as the validation data that has been used for tweaking some of the hyperparameters, such as the learning rate\n", "- However, based on the slightly lower test set performance, we can see that the model overfits the training data to a very small degree, as well as the validation data that has been used for tweaking some of the hyperparameters, such as the learning rate\n",
"- This is normal, however, and this gap could potentially be further reduced by increasing the model's dropout rate (`drop_rate`) or the `weight_decay` in the optimizer setting" "- This is normal, however, and this gap could potentially be further reduced by increasing the model's dropout rate (`drop_rate`) or the `weight_decay` in the optimizer setting"
] ]