Chen, Z. (2026) “Preference Optimization via Contrastive Divergence: Your Policy Is Secretly an NLL Estimator”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), pp. 37286–37294. doi: 10.1609/aaai.v40i44.41060.