Chen, Z., Liu, F., Zhu, X., Wang, H., Li, J., Qi, Y., & Ghavamzadeh, M. (2026). Preference Optimization via Contrastive Divergence: Your Policy Is Secretly an NLL Estimator. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37286–37294. https://doi.org/10.1609/aaai.v40i44.41060