mirror of
				https://github.com/rasbt/LLMs-from-scratch.git
				synced 2025-10-31 09:50:23 +00:00 
			
		
		
		
	small figure update
This commit is contained in:
		
							parent
							
								
									b39234fc25
								
							
						
					
					
						commit
						089901db26
					
				| @ -131,7 +131,7 @@ | |||||||
|     "- In other words, DPO focuses on directly optimizing the model's output to align with human preferences or specific objectives\n", |     "- In other words, DPO focuses on directly optimizing the model's output to align with human preferences or specific objectives\n", | ||||||
|     "- Shown below is the main idea as an overview of how DPO works\n", |     "- Shown below is the main idea as an overview of how DPO works\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/5.webp\" width=600px>" |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/5.webp?123\" width=600px>" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -143,7 +143,7 @@ | |||||||
|    "source": [ |    "source": [ | ||||||
|     "- The concrete equation to implement the DPO loss is shown below; we will revisit the equation when we implement it in Python further down in this code notebook\n", |     "- The concrete equation to implement the DPO loss is shown below; we will revisit the equation when we implement it in Python further down in this code notebook\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp\" width=600px>" |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp?123\" width=600px>" | ||||||
|    ] |    ] | ||||||
|   }, |   }, | ||||||
|   { |   { | ||||||
| @ -1807,7 +1807,7 @@ | |||||||
|     "- Note that the DPO loss code below is based on the method proposed in the [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) paper\n", |     "- Note that the DPO loss code below is based on the method proposed in the [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290) paper\n", | ||||||
|     "- For reference, the core DPO equation is shown again below:\n", |     "- For reference, the core DPO equation is shown again below:\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp\" width=800px>\n", |     "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/dpo/3.webp?123\" width=800px>\n", | ||||||
|     "\n", |     "\n", | ||||||
|     "- In the equation above,\n", |     "- In the equation above,\n", | ||||||
|     " - \"expected value\" $\\mathbb{E}$ is statistics jargon and stands for the average or mean value of the random variable (the expression inside the brackets)\n", |     " - \"expected value\" $\\mathbb{E}$ is statistics jargon and stands for the average or mean value of the random variable (the expression inside the brackets)\n", | ||||||
| @ -3088,7 +3088,7 @@ | |||||||
|    "name": "python", |    "name": "python", | ||||||
|    "nbconvert_exporter": "python", |    "nbconvert_exporter": "python", | ||||||
|    "pygments_lexer": "ipython3", |    "pygments_lexer": "ipython3", | ||||||
|    "version": "3.11.9" |    "version": "3.10.6" | ||||||
|   } |   } | ||||||
|  }, |  }, | ||||||
|  "nbformat": 4, |  "nbformat": 4, | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 rasbt
						rasbt