diff --git a/ch07/04_preference-tuning-with-dpo/README.md b/ch07/04_preference-tuning-with-dpo/README.md index 330a658..bbbcc5e 100644 --- a/ch07/04_preference-tuning-with-dpo/README.md +++ b/ch07/04_preference-tuning-with-dpo/README.md @@ -1,3 +1,8 @@ # Chapter 7: Finetuning to Follow Instructions -In progress ... \ No newline at end of file +In progress ... + +In the meantime, see + +- LLM Training: RLHF and Its Alternatives, [https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives](https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives) +- Tips for LLM Pretraining and Evaluating Reward Models, [https://sebastianraschka.com/blog/2024/research-papers-in-march-2024.html](https://sebastianraschka.com/blog/2024/research-papers-in-march-2024.html)