[1]
Das Antar, A. et al. 2025. "Do Your Guardrails Even Guard?’’ Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 8, 1 (Oct. 2025), 705–718. DOI:https://doi.org/10.1609/aies.v8i1.36583.