16 Commits

Author SHA1 Message Date
Jinge Wang
4210386cec
Fix 2 typos in 04_preferene-tuning-with-dpo (#356) 2024-09-15 07:36:22 -05:00
rasbt
06151a809e
note about logistic sigmoid 2024-08-06 19:48:30 -05:00
rasbt
e810f9f004
extend equation description 2024-08-06 19:46:50 -05:00
rasbt
c8090f30ef
add more explanations 2024-08-06 19:45:11 -05:00
rasbt
36fbc7aa74
small figure update 2024-08-05 17:57:16 -05:00
Daniel Kleine
8318d1f002
minor DPO fixes (#298)
* fixed issues, updated .gitignore

* added closing paren

* fixed CEL spelling

* fixed more minor issues

* Update ch07/01_main-chapter-code/ch07.ipynb

* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb

---------

Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
2024-08-05 08:40:46 -05:00
rasbt
36b9d5e0eb
update model path 2024-08-05 07:36:08 -05:00
rasbt
60aada801b
improve latex rendering in dpo notebook 2024-08-04 09:19:59 -05:00
Sebastian Raschka
52435804eb
Direct Preference Optimization from scratch (#294) 2024-08-04 08:57:36 -05:00
rasbt
a7869ad2bf
Fix 8-billion-parameter spelling 2024-07-28 10:48:56 -05:00
Daniel Kleine
9a3b04f92f
fixed typos and formatting (#291) 2024-07-28 10:04:33 -05:00
rasbt
c87e4364b7
formatting 2024-07-27 09:51:24 -05:00
Sebastian Raschka
99af403b9f
Generate preference dataset with Llama 3.1 70B (#289) 2024-07-27 09:44:04 -05:00
Sebastian Raschka
f78ad1f95b
Update README.md 2024-06-23 08:25:01 -05:00
rasbt
4ac480c9ae
add instruction dataset 2024-06-08 10:38:41 -05:00
rasbt
30ebd7427c
Ollama-based model evaluation 2024-06-05 08:21:28 -05:00