Feng, D. (2025) “LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets”, Proceedings of the AAAI Conference on Artificial Intelligence, 39(26), pp. 27277–27285. doi: 10.1609/aaai.v39i26.34937.