Direct Preference Optimization: Your Language Model is Secretly a Reward Model 리뷰

2025. 6. 10. 14:49

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs 리뷰 (0)	2025.06.15
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness 리뷰 (1)	2025.06.11
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training 논문 리뷰 (1)	2025.06.07
PLAN-AND-ACT: Improving Planning of Agents for Long-Horizon Tasks 논문 리뷰 (1)	2025.06.01
Training language models to follow instructions with human feedback 리뷰 (0)	2025.05.26

Background