mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-10-30 01:10:33 +00:00
small figure update
This commit is contained in:
parent
b39234fc25
commit
089901db26
@ -131,7 +131,7 @@
|
|||||||
"- In other words, DPO focuses on directly optimizing the model's output to align with human preferences or specific objectives\n",
|
"- In other words, DPO focuses on directly optimizing the model's output to align with human preferences or specific objectives\n",
|
||||||
"- Shown below is the main idea as an overview of how DPO works\n",
|
"- Shown below is the main idea as an overview of how DPO works\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/5.webp\" width=600px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/5.webp?123\" width=600px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -143,7 +143,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"- The concrete equation to implement the DPO loss is shown below; we will revisit the equation when we implement it in Python further down in this code notebook\n",
|
"- The concrete equation to implement the DPO loss is shown below; we will revisit the equation when we implement it in Python further down in this code notebook\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp\" width=600px>"
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp?123\" width=600px>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -1807,7 +1807,7 @@
|
|||||||
"- Note that the DPO loss code below is based on the method proposed in the [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) paper\n",
|
"- Note that the DPO loss code below is based on the method proposed in the [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) paper\n",
|
||||||
"- For reference, the core DPO equation is shown again below:\n",
|
"- For reference, the core DPO equation is shown again below:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp\" width=800px>\n",
|
"<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp?123\" width=800px>\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- In the equation above,\n",
|
"- In the equation above,\n",
|
||||||
" - \"expected value\" $\\mathbb{E}$ is statistics jargon and stands for the average or mean value of the random variable (the expression inside the brackets)\n",
|
" - \"expected value\" $\\mathbb{E}$ is statistics jargon and stands for the average or mean value of the random variable (the expression inside the brackets)\n",
|
||||||
@ -3088,7 +3088,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.11.9"
|
"version": "3.10.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user