diff --git a/ch05/01_main-chapter-code/ch05.ipynb b/ch05/01_main-chapter-code/ch05.ipynb
index 0b4fb25..cd90777 100644
--- a/ch05/01_main-chapter-code/ch05.ipynb
+++ b/ch05/01_main-chapter-code/ch05.ipynb
@@ -161,7 +161,7 @@
    "source": [
     "- We use dropout of 0.1 above, but it's relatively common to train LLMs without dropout nowadays\n",
     "- Modern LLMs also don't use bias vectors in the `nn.Linear` layers for the query, key, and value matrices (unlike earlier GPT models), which is achieved by setting `\"qkv_bias\": False`\n",
-    "- We reduce the context length (`context_length`) of only 256 tokens to reduce the computational resource requirements for training the model, whereas the original 124 million parameter GPT-2 model used 1024 characters\n",
+    "- We reduce the context length (`context_length`) of only 256 tokens to reduce the computational resource requirements for training the model, whereas the original 124 million parameter GPT-2 model used 1024 tokens\n",
     "  - This is so that more readers will be able to follow and execute the code examples on their laptop computer\n",
     "  - However, please feel free to increase the `context_length` to 1024 tokens (this would not require any code changes)\n",
     "  - We will also load a model with a 1024 `context_length` later from pretrained weights"