| 
							
							
								 rasbt | bed5f89378 | fix reward margins plot label in dpo nb | 2025-01-12 14:04:05 -06:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | 992f3068d1 | Auto download DPO dataset if not already available in path (#479) * Auto download DPO dataset if not already available in path
* update tests to account for latest HF transformers release in unit tests
* pep 8 | 2025-01-12 12:27:28 -06:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | 05f2a398b8 | adds no-grad context for reference model to DPO (#473) | 2025-01-07 20:49:01 -06:00 |  | 
			
				
					| 
							
							
								 QS | 976c92010c | typo fixed (#468) * typo fixed
* only update plot
---------
Co-authored-by: rasbt <mail@sebastianraschka.com> | 2025-01-05 09:17:13 -06:00 |  | 
			
				
					| 
							
							
								 Jinge Wang | 0dbc203f66 | Fix 2 typos in 04_preferene-tuning-with-dpo (#356) | 2024-09-15 07:36:22 -05:00 |  | 
			
				
					| 
							
							
								 rasbt | 7a5771932b | note about logistic sigmoid | 2024-08-06 19:48:30 -05:00 |  | 
			
				
					| 
							
							
								 rasbt | 2245f8d9c1 | extend equation description | 2024-08-06 19:46:50 -05:00 |  | 
			
				
					| 
							
							
								 rasbt | a65e06ff99 | add more explanations | 2024-08-06 19:45:11 -05:00 |  | 
			
				
					| 
							
							
								 rasbt | 089901db26 | small figure update | 2024-08-05 17:57:16 -05:00 |  | 
			
				
					| 
							
							
								 Daniel Kleine | dcdf04e3bd | minor DPO fixes (#298) * fixed issues, updated .gitignore
* added closing paren
* fixed CEL spelling
* fixed more minor issues
* Update ch07/01_main-chapter-code/ch07.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com> | 2024-08-05 08:40:46 -05:00 |  | 
			
				
					| 
							
							
								 rasbt | 6030071e3f | update model path | 2024-08-05 07:36:08 -05:00 |  | 
			
				
					| 
							
							
								 rasbt | f302f5e8d5 | improve latex rendering in dpo notebook | 2024-08-04 09:19:59 -05:00 |  | 
			
				
					| 
							
							
								 Sebastian Raschka | 09dc080cf3 | Direct Preference Optimization from scratch (#294) | 2024-08-04 08:57:36 -05:00 |  |