(1)
Pan, J.; Shen, W.; Huang, S.; Zhou, Q.; Zhang, Y. Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model. AAAI 2026, 40, 32646-32654.