1.
Das Antar A, Huan X, Banovic N. "Do Your Guardrails Even Guard?’’ Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations. AIES [Internet]. 2025 Oct. 15 [cited 2026 May 27];8(1):705-18. Available from: https://ojs.aaai.org/index.php/AIES/article/view/36583