format the other GPT architecture sizes

2025-11-13 08:34:56 +00:00 · 2024-02-10 17:47:56 -06:00 · 2024-02-10 17:47:56 -06:00 · 496b52f842
commit 496b52f842
parent 40477c55b3
1 changed files with 20 additions and 20 deletions
--- a/ch04/01_main-chapter-code/ch04.ipynb
+++ b/ch04/01_main-chapter-code/ch04.ipynb
@ -884,7 +884,7 @@
   "source": [
    "torch.manual_seed(123)\n",
    "\n",
-    "x = torch.rand(2, 6, 768)\n",
+    "x = torch.rand(2, 6, 768)  # Shape: [batch_size, num_tokens, emb_dim]\n",
    "block = TransformerBlock(GPT_CONFIG_124M)\n",
    "output = block(x)\n",
    "\n",
@ -1140,24 +1140,24 @@
   "id": "309a3be4-c20a-4657-b4e0-77c97510b47c",
   "metadata": {},
   "source": [
-    "- Exercise: you can try the other configurations as well:\n",
+    "- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), as well.\n",
    "\n",
-    "- GPT2-small (the 124M configuration we implemented):\n",
+    "    - **GPT2-small** (the 124M configuration we already implemented):\n",
    "        - \"emb_dim\" = 768\n",
    "        - \"n_layers\" = 12\n",
    "        - \"n_heads\" = 12\n",
    "\n",
-    "- GPT2-medium:\n",
+    "    - **GPT2-medium:**\n",
    "        - \"emb_dim\" = 1024\n",
    "        - \"n_layers\" = 24\n",
    "        - \"n_heads\" = 16\n",
-    "\n",
-    "- GPT2-large:\n",
+    "    \n",
+    "    - **GPT2-large:**\n",
    "        - \"emb_dim\" = 1280\n",
    "        - \"n_layers\" = 36\n",
    "        - \"n_heads\" = 20\n",
-    "\n",
-    "- GPT2-XL:\n",
+    "    \n",
+    "    - **GPT2-XL:**\n",
    "        - \"emb_dim\" = 1600\n",
    "        - \"n_layers\" = 48\n",
    "        - \"n_heads\" = 25"