mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-28 08:19:21 +00:00
Fix 2 typos in 04_preferene-tuning-with-dpo (#356)
This commit is contained in:
parent
adbcf8c7b6
commit
0dbc203f66
@ -2774,7 +2774,7 @@
|
|||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"- As we can see above, the loss continues to improve, which is a good sign\n",
|
"- As we can see above, the loss continues to improve, which is a good sign\n",
|
||||||
"- Based on the downward slope, one might be tempted to train the model a bit further (and readers are encouraged to try this), but not that DPO is prone to collapse, where the model may start generating nonsensical responses\n",
|
"- Based on the downward slope, one might be tempted to train the model a bit further (and readers are encouraged to try this), but note that DPO is prone to collapse, where the model may start generating nonsensical responses\n",
|
||||||
"- Next, let's take a look at the reward margins:"
|
"- Next, let's take a look at the reward margins:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -2823,7 +2823,7 @@
|
|||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"- As we can see, and as it's desired, the reward margins improve; this mirrors the loss curve and is a good sign\n",
|
"- As we can see, and as it's desired, the reward margins improve; this mirrors the loss curve and is a good sign\n",
|
||||||
"- Note that DPO losses and reward margins are valuable metrics to track during training; however, they don't tell the whole store\n",
|
"- Note that DPO losses and reward margins are valuable metrics to track during training; however, they don't tell the whole story\n",
|
||||||
"- Lastly, and most importantly, we have to conduct a qualitative check of the responses\n",
|
"- Lastly, and most importantly, we have to conduct a qualitative check of the responses\n",
|
||||||
"- Here, we will look at the response (in addition, you could use an LLM to score the responses similar to chapter 7)"
|
"- Here, we will look at the response (in addition, you could use an LLM to score the responses similar to chapter 7)"
|
||||||
]
|
]
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user