Feng, D., Qin, B., Huang, C., Huang, Y., Zhang, Z., & Lei, W. (2025). LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets. Proceedings of the AAAI Conference on Artificial Intelligence, 39(26), 27277–27285. https://doi.org/10.1609/aaai.v39i26.34937