9 Commits

Author SHA1 Message Date
Jinge Wang
4210386cec
Fix 2 typos in 04_preferene-tuning-with-dpo (#356) 2024-09-15 07:36:22 -05:00
rasbt
06151a809e
note about logistic sigmoid 2024-08-06 19:48:30 -05:00
rasbt
e810f9f004
extend equation description 2024-08-06 19:46:50 -05:00
rasbt
c8090f30ef
add more explanations 2024-08-06 19:45:11 -05:00
rasbt
36fbc7aa74
small figure update 2024-08-05 17:57:16 -05:00
Daniel Kleine
8318d1f002
minor DPO fixes (#298)
* fixed issues, updated .gitignore

* added closing paren

* fixed CEL spelling

* fixed more minor issues

* Update ch07/01_main-chapter-code/ch07.ipynb

* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-08-05 08:40:46 -05:00
rasbt
36b9d5e0eb
update model path 2024-08-05 07:36:08 -05:00
rasbt
60aada801b
improve latex rendering in dpo notebook 2024-08-04 09:19:59 -05:00
Sebastian Raschka
52435804eb
Direct Preference Optimization from scratch (#294) 2024-08-04 08:57:36 -05:00