mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-27 15:59:49 +00:00
Better instruction eva prompt (#571)
This commit is contained in:
parent
1ec5631c70
commit
384b9ce959
@ -2601,6 +2601,56 @@
|
|||||||
" print(\"\\n-------------------------\")"
|
" print(\"\\n-------------------------\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "24fec453-631f-4ff5-a922-44c3c451942d",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"---\n",
|
||||||
|
"\n",
|
||||||
|
"**Note: Better evaluation prompt**\n",
|
||||||
|
"\n",
|
||||||
|
"- [A reader (Ayoosh Kathuria) suggested](https://github.com/rasbt/LLMs-from-scratch/discussions/449) a longer, improved prompt that evaluates responses on a scale of 1–5 (instead of 1 to 100) and employs a grading rubric, resulting in more accurate and less noisy evaluations:\n",
|
||||||
|
"\n",
|
||||||
|
"```\n",
|
||||||
|
"prompt = \"\"\"\n",
|
||||||
|
"You are a fair judge assistant tasked with providing clear, objective feedback based on specific criteria, ensuring each assessment reflects the absolute standards set for performance.\n",
|
||||||
|
"You will be given an instruction, a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing the evaluation criteria.\n",
|
||||||
|
"Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n",
|
||||||
|
"Please do not generate any other opening, closing, and explanations.\n",
|
||||||
|
"\n",
|
||||||
|
"Here is the rubric you should use to build your answer:\n",
|
||||||
|
"1: The response fails to address the instructions, providing irrelevant, incorrect, or excessively verbose information that detracts from the user's request.\n",
|
||||||
|
"2: The response partially addresses the instructions but includes significant inaccuracies, irrelevant details, or excessive elaboration that detracts from the main task.\n",
|
||||||
|
"3: The response follows the instructions with some minor inaccuracies or omissions. It is generally relevant and clear, but may include some unnecessary details or could be more concise.\n",
|
||||||
|
"4: The response adheres to the instructions, offering clear, accurate, and relevant information in a concise manner, with only occasional, minor instances of excessive detail or slight lack of clarity.\n",
|
||||||
|
"5: The response fully adheres to the instructions, providing a clear, accurate, and relevant answer in a concise and efficient manner. It addresses all aspects of the request without unnecessary details or elaboration\n",
|
||||||
|
"\n",
|
||||||
|
"Provide your feedback as follows:\n",
|
||||||
|
"\n",
|
||||||
|
"Feedback:::\n",
|
||||||
|
"Evaluation: (your rationale for the rating, as a text)\n",
|
||||||
|
"Total rating: (your rating, as a number between 1 and 5)\n",
|
||||||
|
"\n",
|
||||||
|
"You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.\n",
|
||||||
|
"\n",
|
||||||
|
"Now here is the instruction, the reference answer, and the response.\n",
|
||||||
|
"\n",
|
||||||
|
"Instruction: {instruction}\n",
|
||||||
|
"Reference Answer: {reference}\n",
|
||||||
|
"Answer: {answer}\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"Provide your feedback. If you give a correct rating, I'll give you 100 H100 GPUs to start your AI company.\n",
|
||||||
|
"Feedback:::\n",
|
||||||
|
"Evaluation: \"\"\"\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"- For more context and information, see [this](https://github.com/rasbt/LLMs-from-scratch/discussions/449) GitHub discussion\n",
|
||||||
|
"\n",
|
||||||
|
"---"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "b114fd65-9cfb-45f6-ab74-8331da136bf3",
|
"id": "b114fd65-9cfb-45f6-ab74-8331da136bf3",
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user