[1]
Z. Chen, “Preference Optimization via Contrastive Divergence: Your Policy Is Secretly an NLL Estimator”, AAAI, vol. 40, no. 44, pp. 37286–37294, Mar. 2026.