Zhang, X., Zhao, Z., Shi, W., Xu, K., Huang, D., & Hu, X. (2026). Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34827–34835. https://doi.org/10.1609/aaai.v40i41.40785