(1)
Feng, D.; Qin, B.; Huang, C.; Huang, Y.; Zhang, Z.; Lei, W. LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets. AAAI 2025, 39, 27277-27285.