Better instruction eva prompt (#571)

2025-10-27 15:59:49 +00:00 · 2025-03-15 17:13:15 -05:00 · 2025-03-15 17:13:15 -05:00 · 384b9ce959
commit 384b9ce959
parent 1ec5631c70
1 changed files with 50 additions and 0 deletions
--- a/ch07/01_main-chapter-code/ch07.ipynb
+++ b/ch07/01_main-chapter-code/ch07.ipynb
@ -2601,6 +2601,56 @@
    "    print(\"\\n-------------------------\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24fec453-631f-4ff5-a922-44c3c451942d",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "**Note: Better evaluation prompt**\n",
    "\n",
    "- [A reader (Ayoosh Kathuria) suggested](https://github.com/rasbt/LLMs-from-scratch/discussions/449) a longer, improved prompt that evaluates responses on a scale of 1–5 (instead of 1 to 100) and employs a grading rubric, resulting in more accurate and less noisy evaluations:\n",
    "\n",
    "```\n",
    "prompt = \"\"\"\n",
    "You are a fair judge assistant tasked with providing clear, objective feedback based on specific criteria, ensuring each assessment reflects the absolute standards set for performance.\n",
    "You will be given an instruction, a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing the evaluation criteria.\n",
    "Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n",
    "Please do not generate any other opening, closing, and explanations.\n",
    "\n",
    "Here is the rubric you should use to build your answer:\n",
    "1: The response fails to address the instructions, providing irrelevant, incorrect, or excessively verbose information that detracts from the user's request.\n",
    "2: The response partially addresses the instructions but includes significant inaccuracies, irrelevant details, or excessive elaboration that detracts from the main task.\n",
    "3: The response follows the instructions with some minor inaccuracies or omissions. It is generally relevant and clear, but may include some unnecessary details or could be more concise.\n",
    "4: The response adheres to the instructions, offering clear, accurate, and relevant information in a concise manner, with only occasional, minor instances of excessive detail or slight lack of clarity.\n",
    "5: The response fully adheres to the instructions, providing a clear, accurate, and relevant answer in a concise and efficient manner. It addresses all aspects of the request without unnecessary details or elaboration\n",
    "\n",
    "Provide your feedback as follows:\n",
    "\n",
    "Feedback:::\n",
    "Evaluation: (your rationale for the rating, as a text)\n",
    "Total rating: (your rating, as a number between 1 and 5)\n",
    "\n",
    "You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.\n",
    "\n",
    "Now here is the instruction, the reference answer, and the response.\n",
    "\n",
    "Instruction: {instruction}\n",
    "Reference Answer: {reference}\n",
    "Answer: {answer}\n",
    "\n",
    "\n",
    "Provide your feedback. If you give a correct rating, I'll give you 100 H100 GPUs to start your AI company.\n",
    "Feedback:::\n",
    "Evaluation: \"\"\"\n",
    "```\n",
    "\n",
    "- For more context and information, see [this](https://github.com/rasbt/LLMs-from-scratch/discussions/449) GitHub discussion\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b114fd65-9cfb-45f6-ab74-8331da136bf3",