Pan, J., Shen, W., Huang, S., Zhou, Q., & Zhang, Y. (2026). Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32646–32654. https://doi.org/10.1609/aaai.v40i38.40542