update figure 6.6

2025-11-02 19:00:14 +00:00 · 2024-05-08 20:46:54 -05:00 · 2024-05-08 20:46:54 -05:00 · 1e7d1f3bcb
commit 1e7d1f3bcb
parent a31d571625
1 changed files with 9 additions and 28 deletions
--- a/ch06/01_main-chapter-code/ch06.ipynb
+++ b/ch06/01_main-chapter-code/ch06.ipynb
@ -523,6 +523,14 @@
    "- For that, we use `<|endoftext|>` as a padding token, as discussed in chapter 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0829f33f-1428-4f22-9886-7fee633b3666",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp?123\" width=500px>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
@ -550,25 +558,6 @@
    "print(tokenizer.encode(\"<|endoftext|>\", allowed_special={\"<|endoftext|>\"}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0ff0f6b2-376b-4740-8858-55b60784be73",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1212, 318, 262, 717, 2420, 3275]\n"
     ]
    }
   ],
   "source": [
    "token_ids = tokenizer.encode(\"This is the first text message\")\n",
    "print(token_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04f582ff-68bf-450e-bd87-5fb61afe431c",
@ -579,14 +568,6 @@
    "- The `SpamDataset` class below identifies the longest sequence in the training dataset and adds the padding token to the others to match that sequence length"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0829f33f-1428-4f22-9886-7fee633b3666",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/ch06_compressed/pad-input-sequences.webp\" width=500px>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
@ -2284,7 +2265,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,