[1]

A. Das Antar, X. Huan, and N. Banovic, “"Do Your Guardrails Even Guard?’’ Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations”, AIES, vol. 8, no. 1, pp. 705–718, Oct. 2025.