(1)
Das Antar, A.; Huan, X.; Banovic, N. "Do Your Guardrails Even Guard?’’ Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs With Expert User Expectations. AIES 2025, 8, 705-718.