[1]
X. Lou, J. Zhang, J. Xie, L. Liu, D. Yan, and K. Huang, “Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling”, AAAI, vol. 39, no. 26, pp. 27509–27517, Apr. 2025.