Xu, Y. (2026) “When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), pp. 38057–38065. doi: 10.1609/aaai.v40i44.41143.