(1)
Chen, Z.; Liu, F.; Zhu, X.; Wang, H.; Li, J.; Qi, Y.; Ghavamzadeh, M. Preference Optimization via Contrastive Divergence: Your Policy Is Secretly an NLL Estimator. AAAI 2026, 40, 37286-37294.