Sebastian Raschka
|
d6c3990c57
|
Training on MPS in PyTorch 2.9 (#900)
* Training on MPS in PyTorch 2.9
* update
|
2025-11-01 16:55:09 -05:00 |
|
Sebastian Raschka
|
7ca7c47e4a
|
Make quote style consistent (#891)
|
2025-10-21 19:42:33 -05:00 |
|
Sebastian Raschka
|
7bd263144e
|
Switch from urllib to requests to improve reliability (#867)
* Switch from urllib to requests to improve reliability
* Keep ruff linter-specific
* update
* update
* update
|
2025-10-07 15:22:59 -05:00 |
|
Sebastian Raschka
|
9bc827ea7e
|
Numerically stable generate on mps (#849)
* Numerically stable generate on mps
* add file
|
2025-09-26 22:42:44 -05:00 |
|
Andreas Yin
|
8e170312fe
|
fix: correct role of the beta hyperparameter on the DPO loss (#818)
Increasing beta leads to less divergence between the new model and the reference model.
|
2025-09-12 20:21:38 -05:00 |
|
Sebastian Raschka
|
adaf4faaae
|
Dpo vocab size clarification (#628)
* Llama3 from scratch improvements
* vocab size should be 50257 not 50256
* restore
|
2025-04-18 17:20:56 -05:00 |
|
casinca
|
1b242d01a5
|
Minor DPO fixes (#617)
* minor dpo fixes
* Update dpo-from-scratch.ipynb
metadata diff
|
2025-04-16 12:56:49 -05:00 |
|
Sebastian Raschka
|
c21bfe4a23
|
Add PyPI package (#576)
* Add PyPI package
* fixes
* fixes
|
2025-03-23 19:28:49 -05:00 |
|
rasbt
|
b524afe3da
|
fix reward margins plot label in dpo nb
|
2025-01-12 14:04:05 -06:00 |
|
Sebastian Raschka
|
4bfbcd069d
|
Auto download DPO dataset if not already available in path (#479)
* Auto download DPO dataset if not already available in path
* update tests to account for latest HF transformers release in unit tests
* pep 8
|
2025-01-12 12:27:28 -06:00 |
|
Sebastian Raschka
|
a48f9c7fe2
|
adds no-grad context for reference model to DPO (#473)
|
2025-01-07 20:49:01 -06:00 |
|
QS
|
9b95557ba2
|
typo fixed (#468)
* typo fixed
* only update plot
---------
Co-authored-by: rasbt <mail@sebastianraschka.com>
|
2025-01-05 09:17:13 -06:00 |
|
Jinge Wang
|
4210386cec
|
Fix 2 typos in 04_preferene-tuning-with-dpo (#356)
|
2024-09-15 07:36:22 -05:00 |
|
rasbt
|
06151a809e
|
note about logistic sigmoid
|
2024-08-06 19:48:30 -05:00 |
|
rasbt
|
e810f9f004
|
extend equation description
|
2024-08-06 19:46:50 -05:00 |
|
rasbt
|
c8090f30ef
|
add more explanations
|
2024-08-06 19:45:11 -05:00 |
|
rasbt
|
36fbc7aa74
|
small figure update
|
2024-08-05 17:57:16 -05:00 |
|
Daniel Kleine
|
8318d1f002
|
minor DPO fixes (#298)
* fixed issues, updated .gitignore
* added closing paren
* fixed CEL spelling
* fixed more minor issues
* Update ch07/01_main-chapter-code/ch07.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
* Update ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb
---------
Co-authored-by: Sebastian Raschka <mail@sebastianraschka.com>
|
2024-08-05 08:40:46 -05:00 |
|
rasbt
|
36b9d5e0eb
|
update model path
|
2024-08-05 07:36:08 -05:00 |
|
rasbt
|
60aada801b
|
improve latex rendering in dpo notebook
|
2024-08-04 09:19:59 -05:00 |
|
Sebastian Raschka
|
52435804eb
|
Direct Preference Optimization from scratch (#294)
|
2024-08-04 08:57:36 -05:00 |
|
rasbt
|
a7869ad2bf
|
Fix 8-billion-parameter spelling
|
2024-07-28 10:48:56 -05:00 |
|
Daniel Kleine
|
9a3b04f92f
|
fixed typos and formatting (#291)
|
2024-07-28 10:04:33 -05:00 |
|
rasbt
|
c87e4364b7
|
formatting
|
2024-07-27 09:51:24 -05:00 |
|
Sebastian Raschka
|
99af403b9f
|
Generate preference dataset with Llama 3.1 70B (#289)
|
2024-07-27 09:44:04 -05:00 |
|
Sebastian Raschka
|
f78ad1f95b
|
Update README.md
|
2024-06-23 08:25:01 -05:00 |
|
rasbt
|
4ac480c9ae
|
add instruction dataset
|
2024-06-08 10:38:41 -05:00 |
|
rasbt
|
30ebd7427c
|
Ollama-based model evaluation
|
2024-06-05 08:21:28 -05:00 |
|