[1]
J. Pan, W. Shen, S. Huang, Q. Zhou, and Y. Zhang, “Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model”, AAAI, vol. 40, no. 38, pp. 32646–32654, Mar. 2026.