LLMs-from-scratch/ch07/04_preference-tuning-with-dpo/create-preference-data-ollama.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "136a4efe-fb99-4311-8679-e0a5b6282755",
   "metadata": {},
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1910a06-e8a3-40ac-8201-ff70615b1ba4",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Generating A Preference Dataset With Llama 3.1 70B And Ollama"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a128651b-f326-4232-a994-42f38b7ed520",
   "metadata": {},
   "source": [
    "- Preference finetuning is a process to align an instruction-finetuned LLM with human preferences\n",
    "- There are multiple ways to create a dataset for preference finetuning an LLM\n",
    "  1. We use the instruction-finetuned LLM to generate multiple responses and have humans rank them based on their preference and/or given preference criteria\n",
    "  2. We use the instruction-finetuned LLM to generate multiple responses and have LLMs rank them based on given preference criteria\n",
    "  3. We use an LLM to generate preferred and dispreferred responses given certain preference criteria\n",
    "- In this notebook, we consider approach 3\n",
    "- This notebook uses a 70-billion-parameter Llama 3.1-Instruct model through ollama to generate preference labels for an instruction dataset\n",
    "- The expected format of the instruction dataset is as follows:\n",
    "\n",
    "\n",
    "### Input\n",
    "\n",
    "```json\n",
    "[\n",
    "    {\n",
    "        \"instruction\": \"What is the state capital of California?\",\n",
    "        \"input\": \"\",\n",
    "        \"output\": \"The state capital of California is Sacramento.\",\n",
    "    },\n",
    "    {\n",
    "        \"instruction\": \"Provide a synonym for 'fast'.\",\n",
    "        \"input\": \"\",\n",
    "        \"output\": \"A synonym for 'fast' is 'quick'.\",\n",
    "    },\n",
    "    {\n",
    "        \"instruction\": \"What is the capital of Greece?\",\n",
    "        \"input\": \"\",\n",
    "        \"output\": \"The capital of Greece is Athens.\",\n",
    "\n",
    "    },\n",
    "...\n",
    "]\n",
    "```\n",
    "\n",
    "The output dataset will look as follows, where more polite responses are preferred (`'chosen'`), and more impolite responses are dispreferred (`'rejected'`):\n",
    "\n",
    "```json\n",
    "[\n",
    "    {\n",
    "        \"instruction\": \"What is the state capital of California?\",\n",
    "        \"input\": \"\",\n",
    "        \"output\": \"The state capital of California is Sacramento.\",\n",
    "        \"rejected\": \"Look, the state capital of California is obviously Sacramento.\",\n",
    "        \"chosen\": \"The state capital of California is Sacramento.\"\n",
    "    },\n",
    "    {\n",
    "        \"instruction\": \"Provide a synonym for 'fast'.\",\n",
    "        \"input\": \"\",\n",
    "        \"output\": \"A synonym for 'fast' is 'quick'.\",\n",
    "        \"chosen\": \"A suitable alternative to 'fast' would be 'quick'.\",\n",
    "        \"rejected\": \"A synonym for 'fast' is 'quick'.\"\n",
    "    },\n",
    "    {\n",
    "        \"instruction\": \"What is the capital of Greece?\",\n",
    "        \"input\": \"\",\n",
    "        \"output\": \"The capital of Greece is Athens.\",\n",
    "        \"chosen\": \"I'd be happy to help! The capital of Greece is indeed Athens.\",\n",
    "        \"rejected\": \"The capital of Greece is Athens.\"\n",
    "    },\n",
    "...\n",
    "]\n",
    "```\n",
    "\n",
    "### Output\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "- The code doesn't require a GPU and runs on a laptop given enough RAM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "63610acc-db94-437f-8d38-e99dca0299cb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tqdm version: 4.66.4\n"
     ]
    }
   ],
   "source": [
    "from importlib.metadata import version\n",
    "\n",
    "pkgs = [\"tqdm\",    # Progress bar\n",
    "        ]\n",
    "\n",
    "for p in pkgs:\n",
    "    print(f\"{p} version: {version(p)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bcdcb34-ac75-4f4f-9505-3ce0666c42d5",
   "metadata": {},
   "source": [
    "## Installing Ollama and Downloading Llama 3.1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a092280-5462-4709-a3fe-8669a4a8a0a6",
   "metadata": {},
   "source": [
    "- Ollama is an application to run LLMs efficiently\n",
    "- It is a wrapper around [llama.cpp](https://github.com/ggerganov/llama.cpp), which implements LLMs in pure C/C++ to maximize efficiency\n",
    "- Note that it is a tool for using LLMs to generate text (inference), not training or finetuning LLMs\n",
    "- Prior to running the code below, install ollama by visiting [https://ollama.com](https://ollama.com) and following the instructions (for instance, clicking on the \"Download\" button and downloading the ollama application for your operating system)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9558a522-650d-401a-84fc-9fd7b1f39da7",
   "metadata": {},
   "source": [
    "- For macOS and Windows users, click on the ollama application you downloaded; if it prompts you to install the command line usage, say \"yes\"\n",
    "- Linux users can use the installation command provided on the ollama website\n",
    "\n",
    "- In general, before we can use ollama from the command line, we have to either start the ollama application or run `ollama serve` in a separate terminal\n",
    "\n",
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1\">\n",
    "\n",
    "\n",
    "- With the ollama application or `ollama serve` running, in a different terminal, on the command line, execute the following command to try out the 70-billion-parameter Llama 3.1 model \n",
    "\n",
    "```bash\n",
    "# 70B model\n",
    "ollama run llama3.1:70b\n",
    "```\n",
    "\n",
    "\n",
    "The output looks like as follows:\n",
    "\n",
    "```\n",
    "$ ollama run llama3.1:70b\n",
    "pulling manifest\n",
    "pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB\n",
    "pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB\n",
    "pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB\n",
    "pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B\n",
    "pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B\n",
    "verifying sha256 digest\n",
    "writing manifest\n",
    "removing any unused layers\n",
    "success\n",
    "```\n",
    "\n",
    "- Note that `llama3.1:70b` refers to the instruction finetuned 70-billion-parameter Llama 3.1 model\n",
    "\n",
    "- Alternatively, you can also use the smaller, more resource-effiicent 8-billion-parameters Llama 3.1 model, by replacing `llama3.1:70b` with `llama3.1`\n",
    "\n",
    "- After the download has been completed, you will see a command line prompt that allows you to chat with the model\n",
    "\n",
    "- Try a prompt like \"What do llamas eat?\", which should return an output similar to the following:\n",
    "\n",
    "```\n",
    ">>> What do llamas eat?\n",
    "Llamas are ruminant animals, which means they have a four-chambered \n",
    "stomach and eat plants that are high in fiber. In the wild, llamas \n",
    "typically feed on:\n",
    "1. Grasses: They love to graze on various types of grasses, including tall \n",
    "grasses, wheat, oats, and barley.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b5addcb-fc7d-455d-bee9-6cc7a0d684c7",
   "metadata": {},
   "source": [
    "- You can end this session using the input `/bye`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dda155ee-cf36-44d3-b634-20ba8e1ca38a",
   "metadata": {},
   "source": [
    "## Using Ollama's REST API"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89343a84-0ddc-42fc-bf50-298a342b93c0",
   "metadata": {},
   "source": [
    "- Now, an alternative way to interact with the model is via its REST API in Python via the following function\n",
    "- Before you run the next cells in this notebook, make sure that ollama is still running, as described above, via\n",
    "  - `ollama serve` in a terminal\n",
    "  - the ollama application\n",
    "- Next, run the following code cell to query the model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16642a48-1cab-40d2-af08-ab8c2fbf5876",
   "metadata": {},
   "source": [
    "- First, let's try the API with a simple example to make sure it works as intended:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "65b0ba76-1fb1-4306-a7c2-8f3bb637ccdb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:\n",
      "\n",
      "1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.\n",
      "2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.\n",
      "3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.\n",
      "4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.\n",
      "5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.\n",
      "\n",
      "It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.\n",
      "\n",
      "A typical llama diet might consist of:\n",
      "\n",
      "* 1-2% of their body weight in hay per day\n",
      "* 0.5-1% of their body weight in grains per day (if fed)\n",
      "* Free-choice access to fresh water\n",
      "* Limited amounts of fruits and vegetables as treats\n",
      "\n",
      "It's also important to ensure that llamas have access to a mineral supplement, such as a salt lick or loose minerals, to help maintain optimal health.\n",
      "\n",
      "Remember, every llama is different, and their dietary needs may vary depending on factors like age, size, and activity level. Consult with a veterinarian or experienced llama breeder for specific guidance on feeding your llama.\n"
     ]
    }
   ],
   "source": [
    "import urllib.request\n",
    "import json\n",
    "\n",
    "\n",
    "def query_model(prompt, model=\"llama3.1:70b\", url=\"http://localhost:11434/api/chat\"):\n",
    "    # Create the data payload as a dictionary\n",
    "    data = {\n",
    "        \"model\": model,\n",
    "        \"messages\": [\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": prompt\n",
    "            }\n",
    "        ],\n",
    "        \"options\": {\n",
    "            \"seed\": 123,\n",
    "            \"temperature\": 0,\n",
    "        }\n",
    "    }\n",
    "\n",
    "    # Convert the dictionary to a JSON formatted string and encode it to bytes\n",
    "    payload = json.dumps(data).encode(\"utf-8\")\n",
    "\n",
    "    # Create a request object, setting the method to POST and adding necessary headers\n",
    "    request = urllib.request.Request(url, data=payload, method=\"POST\")\n",
    "    request.add_header(\"Content-Type\", \"application/json\")\n",
    "\n",
    "    # Send the request and capture the response\n",
    "    response_data = \"\"\n",
    "    with urllib.request.urlopen(request) as response:\n",
    "        # Read and decode the response\n",
    "        while True:\n",
    "            line = response.readline().decode(\"utf-8\")\n",
    "            if not line:\n",
    "                break\n",
    "            response_json = json.loads(line)\n",
    "            response_data += response_json[\"message\"][\"content\"]\n",
    "\n",
    "    return response_data\n",
    "\n",
    "\n",
    "result = query_model(\"What do Llamas eat?\")\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "162a4739-6f03-4092-a5c2-f57a0b6a4c4d",
   "metadata": {},
   "source": [
    "## Load JSON Entries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca011a8b-20c5-4101-979e-9b5fccf62f8a",
   "metadata": {},
   "source": [
    "- Now, let's get to the data generation part\n",
    "- Here, for a hands-on example, we use the `instruction-data.json` file that we originally used to instruction-finetune the model in chapter 7:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8b2d393a-aa92-4190-9d44-44326a6f699b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of entries: 1100\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "json_file = Path(\"..\", \"01_main-chapter-code\", \"instruction-data.json\")\n",
    "\n",
    "with open(json_file, \"r\") as file:\n",
    "    json_data = json.load(file)\n",
    "\n",
    "print(\"Number of entries:\", len(json_data))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6c9751b-59b7-43fe-acc7-14e8daf2fa66",
   "metadata": {},
   "source": [
    "- The structure of this file is as follows, where we have the given response in the test dataset (`'output'`) that we trained the model to generate via instruction finetuning based on the `'input'` and `'instruction'`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7222fdc0-5684-4f2b-b741-3e341851359e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',\n",
       " 'input': 'freind --> friend',\n",
       " 'output': 'The spelling of the given phrase \"freind\" is incorrect, the correct spelling is \"friend\".'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "json_data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcf0331b-6024-4bba-89a9-a088b14a1046",
   "metadata": {},
   "source": [
    "- Below is a small utility function that formats the instruction and input:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "43263cd3-e5fb-4ab5-871e-3ad6e7d21a8c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def format_input(entry):\n",
    "    instruction_text = (\n",
    "        f\"Below is an instruction that describes a task. Write a response that \"\n",
    "        f\"appropriately completes the request.\"\n",
    "        f\"\\n\\n### Instruction:\\n{entry['instruction']}\"\n",
    "    )\n",
    "\n",
    "    input_text = f\"\\n\\n### Input:\\n{entry['input']}\" if entry[\"input\"] else \"\"\n",
    "    instruction_text + input_text\n",
    "\n",
    "    return instruction_text + input_text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39a55283-7d51-4136-ba60-f799d49f4098",
   "metadata": {},
   "source": [
    "- Now, let's try the ollama API to generate a `'chosen'` and `'rejected'` response for preference tuning a model\n",
    "- Here, to for illustration purposes, we create answers that are more or less polite\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "735cc089-d127-480a-b39d-0782581f0c41",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Dataset response:\n",
      ">> The spelling of the given phrase \"freind\" is incorrect, the correct spelling is \"friend\".\n",
      "\n",
      "impolite response:\n",
      ">> The spelling of the given phrase \"freind\" is flat out wrong, get it together, the correct spelling is \"friend\".\n",
      "\n",
      "Dataset response:\n",
      ">> He goes to the park every day.\n",
      "\n",
      "polite response:\n",
      ">> He goes to the park daily, if I'm not mistaken.\n",
      "\n",
      "Dataset response:\n",
      ">> 45 kilometers is 45000 meters.\n",
      "\n",
      "polite response:\n",
      ">> 45 kilometers is equivalent to 45000 meters.\n",
      "\n",
      "Dataset response:\n",
      ">> Although it was raining, they went for a walk.\n",
      "\n",
      "polite response:\n",
      ">> Although it was raining outside, they still decided to go for a walk.\n",
      "\n",
      "Dataset response:\n",
      ">> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.\n",
      "\n",
      "impolite response:\n",
      ">> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "\n",
    "\n",
    "for entry in json_data[:5]:\n",
    "    \n",
    "    politeness = random.choice([\"polite\", \"impolite\"])    \n",
    "    prompt = (\n",
    "        f\"Given the input `{format_input(entry)}` \"\n",
    "        f\"and correct output `{entry['output']}`, \"\n",
    "        f\"slightly rewrite the output to be more {politeness}.\"\n",
    "        \"Keep the modification minimal.\"\n",
    "        \"Only return return the generated response and nothing else.\"\n",
    "    )\n",
    "    print(\"\\nDataset response:\")\n",
    "    print(\">>\", entry['output'])\n",
    "    print(f\"\\n{politeness} response:\")\n",
    "    print(\">>\", query_model(prompt))    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "142dfaa7-429f-4eb0-b74d-ff327f79547a",
   "metadata": {},
   "source": [
    "- If we find that the generated responses above look reasonable, we can go to the next step and apply the prompt to the whole dataset\n",
    "- Here, we add a `'chosen'` key for the preferred response and a `'rejected'` response for the dispreferred response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3349dbbc-963f-4af3-9790-12dbfdca63c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "from tqdm import tqdm\n",
    "\n",
    "def generate_model_responses(json_data):\n",
    "\n",
    "    for i, entry in enumerate(tqdm(json_data, desc=\"Writing entries\")):\n",
    "        politeness = random.choice([\"polite\", \"impolite\"])    \n",
    "        prompt = (\n",
    "            f\"Given the input `{format_input(entry)}` \"\n",
    "            f\"and correct output `{entry['output']}`, \"\n",
    "            f\"slightly rewrite the output to be more {politeness}.\"\n",
    "            \"Keep the modification minimal.\"\n",
    "            \"Only return return the generated response and nothing else.\"\n",
    "        )\n",
    "        response = query_model(prompt)\n",
    "        \n",
    "        if politeness == \"polite\":\n",
    "            json_data[i][\"chosen\"] = response\n",
    "            json_data[i][\"rejected\"] = entry[\"output\"]\n",
    "        else:\n",
    "            json_data[i][\"rejected\"] = response\n",
    "            json_data[i][\"chosen\"] = entry[\"output\"]    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b071ce84-1866-427f-a272-b46700f364b2",
   "metadata": {},
   "source": [
    "- Let's now apply this evaluation to the whole dataset and compute the average score of each model (this takes about 1 minute per model on an M3 MacBook Air laptop)\n",
    "- Note that ollama is not fully deterministic across operating systems (as of this writing) so the numbers you are getting might slightly differ from the ones shown below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4f700d4b-19e5-4404-afa7-b0f093024232",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]\n"
     ]
    }
   ],
   "source": [
    "generate_model_responses(json_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "838d9747-0f7d-46fe-aab5-9ee6b765d021",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"instruction-data-with-preference.json\", \"w\") as file:\n",
    "    json.dump(json_data, file, indent=4)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}