mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-10-31 01:41:26 +00:00 
			
		
		
		
	make spam spelling consistent
This commit is contained in:
		
							parent
							
								
									41ff2ae4c7
								
							
						
					
					
						commit
						6cc9cf9f4e
					
				| @ -1415,7 +1415,7 @@ | |||||||
|    "name": "python", |    "name": "python", | ||||||
|    "nbconvert_exporter": "python", |    "nbconvert_exporter": "python", | ||||||
|    "pygments_lexer": "ipython3", |    "pygments_lexer": "ipython3", | ||||||
|    "version": "3.10.12" |    "version": "3.11.4" | ||||||
|   } |   } | ||||||
|  }, |  }, | ||||||
|  "nbformat": 4, |  "nbformat": 4, | ||||||
|  | |||||||
| @ -152,7 +152,7 @@ | |||||||
|    }, |    }, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "- This section prepares the dataset we use for classification finetuning\n", |     "- This section prepares the dataset we use for classification finetuning\n", | ||||||
|     "- We use a dataset consisting of SPAM and non-SPAM text messages to finetune the LLM to classify them\n", |     "- We use a dataset consisting of spam and non-spam text messages to finetune the LLM to classify them\n", | ||||||
|     "- First, we download and unzip the dataset" |     "- First, we download and unzip the dataset" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
| @ -354,7 +354,7 @@ | |||||||
|     "id": "e7b6e631-4f0b-4aab-82b9-8898e6663109" |     "id": "e7b6e631-4f0b-4aab-82b9-8898e6663109" | ||||||
|    }, |    }, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "- When we check the class distribution, we see that the data contains \"ham\" (i.e., not-SPAM) much more frequently than \"spam\"" |     "- When we check the class distribution, we see that the data contains \"ham\" (i.e., \"not spam\") much more frequently than \"spam\"" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -424,7 +424,7 @@ | |||||||
|     "    # Count the instances of \"spam\"\n", |     "    # Count the instances of \"spam\"\n", | ||||||
|     "    num_spam = df[df[\"Label\"] == \"spam\"].shape[0]\n", |     "    num_spam = df[df[\"Label\"] == \"spam\"].shape[0]\n", | ||||||
|     "    \n", |     "    \n", | ||||||
|     "    # Randomly sample \"ham' instances to match the number of 'spam' instances\n", |     "    # Randomly sample \"ham\" instances to match the number of \"spam\" instances\n", | ||||||
|     "    ham_subset = df[df[\"Label\"] == \"ham\"].sample(num_spam, random_state=123)\n", |     "    ham_subset = df[df[\"Label\"] == \"ham\"].sample(num_spam, random_state=123)\n", | ||||||
|     "    \n", |     "    \n", | ||||||
|     "    # Combine ham \"subset\" with \"spam\"\n", |     "    # Combine ham \"subset\" with \"spam\"\n", | ||||||
| @ -443,7 +443,7 @@ | |||||||
|     "id": "d3fd2f5a-06d8-4d30-a2e3-230b86c559d6" |     "id": "d3fd2f5a-06d8-4d30-a2e3-230b86c559d6" | ||||||
|    }, |    }, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "- Next, we change the \"string\" class labels \"ham\" and \"spam\" into integer class labels 0 and 1:" |     "- Next, we change the string class labels \"ham\" and \"spam\" into integer class labels 0 and 1:" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -1330,7 +1330,7 @@ | |||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "- Then, we replace the output layer (`model.out_head`), which originally maps the layer inputs to 50,257 dimensions (the size of the vocabulary)\n", |     "- Then, we replace the output layer (`model.out_head`), which originally maps the layer inputs to 50,257 dimensions (the size of the vocabulary)\n", | ||||||
|     "- Since we finetune the model for binary classification (predicting 2 classes, \"spam\" and \"ham\"), we can replace the output layer as shown below, which will be trainable by default\n", |     "- Since we finetune the model for binary classification (predicting 2 classes, \"spam\" and \"not spam\"), we can replace the output layer as shown below, which will be trainable by default\n", | ||||||
|     "- Note that we use `BASE_CONFIG[\"emb_dim\"]` (which is equal to 768 in the `\"gpt2-small (124M)\"` model) to keep the code below more general" |     "- Note that we use `BASE_CONFIG[\"emb_dim\"]` (which is equal to 768 in the `\"gpt2-small (124M)\"` model) to keep the code below more general" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
| @ -1538,7 +1538,7 @@ | |||||||
|     "- Hence, instead, we minimize the cross entropy loss as a proxy for maximizing the classification accuracy (you can learn more about this topic in lecture 8 of my freely available [Introduction to Deep Learning](https://sebastianraschka.com/blog/2021/dl-course.html#l08-multinomial-logistic-regression--softmax-regression) class.\n", |     "- Hence, instead, we minimize the cross entropy loss as a proxy for maximizing the classification accuracy (you can learn more about this topic in lecture 8 of my freely available [Introduction to Deep Learning](https://sebastianraschka.com/blog/2021/dl-course.html#l08-multinomial-logistic-regression--softmax-regression) class.\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "- Note that in chapter 5, we calculated the cross entropy loss for the next predicted token over the 50,257 token IDs in the vocabulary\n", |     "- Note that in chapter 5, we calculated the cross entropy loss for the next predicted token over the 50,257 token IDs in the vocabulary\n", | ||||||
|     "- Here, we calculate the cross entropy in a similar fashion; the only difference is that instead of 50,257 token IDs, we now have only two choices: spam (label 1) or ham (label 0).\n", |     "- Here, we calculate the cross entropy in a similar fashion; the only difference is that instead of 50,257 token IDs, we now have only two choices: \"spam\" (label 1) or \"not spam\" (label 0).\n", | ||||||
|     "- In other words, the loss calculation training code is practically identical to the one in chapter 5, but we now only have two labels instead of 50,257 labels (token IDs).\n", |     "- In other words, the loss calculation training code is practically identical to the one in chapter 5, but we now only have two labels instead of 50,257 labels (token IDs).\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "\n", |     "\n", | ||||||
| @ -2071,7 +2071,7 @@ | |||||||
|    "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0", |    "id": "a74d9ad7-3ec1-450e-8c9f-4fc46d3d5bb0", | ||||||
|    "metadata": {}, |    "metadata": {}, | ||||||
|    "source": [ |    "source": [ | ||||||
|     "## 6.8 Using the LLM as a SPAM classifier" |     "## 6.8 Using the LLM as a spam classifier" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -2284,7 +2284,7 @@ | |||||||
|    "name": "python", |    "name": "python", | ||||||
|    "nbconvert_exporter": "python", |    "nbconvert_exporter": "python", | ||||||
|    "pygments_lexer": "ipython3", |    "pygments_lexer": "ipython3", | ||||||
|    "version": "3.10.12" |    "version": "3.11.4" | ||||||
|   } |   } | ||||||
|  }, |  }, | ||||||
|  "nbformat": 4, |  "nbformat": 4, | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 rasbt
						rasbt