1.
Pan J, Shen W, Huang S, Zhou Q, Zhang Y. Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model. AAAI [Internet]. 2026 Mar. 14 [cited 2026 May 14];40(38):32646-54. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/40542