diff --git a/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb b/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb index c8773b1..29c5d6e 100644 --- a/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb +++ b/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb @@ -131,7 +131,7 @@ "- In other words, DPO focuses on directly optimizing the model's output to align with human preferences or specific objectives\n", "- Shown below is the main idea as an overview of how DPO works\n", "\n", - "" + "" ] }, { @@ -143,7 +143,7 @@ "source": [ "- The concrete equation to implement the DPO loss is shown below; we will revisit the equation when we implement it in Python further down in this code notebook\n", "\n", - "" + "" ] }, { @@ -1807,7 +1807,7 @@ "- Note that the DPO loss code below is based on the method proposed in the [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) paper\n", "- For reference, the core DPO equation is shown again below:\n", "\n", - "\n", + "\n", "\n", "- In the equation above,\n", " - \"expected value\" $\\mathbb{E}$ is statistics jargon and stands for the average or mean value of the random variable (the expression inside the brackets)\n", @@ -3088,7 +3088,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.9" + "version": "3.10.6" } }, "nbformat": 4,