Das Antar, Anindya, Xun Huan, and Nikola Banovic. “"Do Your Guardrails Even Guard?’’ Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs With Expert User Expectations”. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8, no. 1 (October 15, 2025): 705–718. Accessed May 27, 2026. https://ojs.aaai.org/index.php/AIES/article/view/36583.