(1)
Zhang, X.; Zhao, Z.; Shi, W.; Xu, K.; Huang, D.; Hu, X. Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions. AAAI 2026, 40, 34827-34835.