Ru, Y., Huang, Y., & Zhang, X. (2026). RMO: Towards Better LLM Alignment via Reshaping Reward Margin Distributions. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 32851-32859. https://doi.org/10.1609/aaai.v40i39.40565