Xu, Y., Ye, X., Chen, Y., & Zhang, Q. (2026). When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 38057–38065. https://doi.org/10.1609/aaai.v40i44.41143