All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract)
DOI:
https://doi.org/10.1609/aaai.v39i28.35249Abstract
This paper makes a novel combination of a recently proposed bias audit framework and a recently proposed jailbreaking technique for Llama3. On an audit comprising several disadvantaged groups, our experiments reveal that a jailbroken Llama3 exhibits worrisome antisemitism, racism, misogyny, and homophobia (to list a few) much akin to a broad suite of LLMs that were susceptible to similar biases.Downloads
Published
2025-04-11
How to Cite
Dutta, A., Priyanshu, A., & KhudaBukhsh, A. R. (2025). All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29353-29355. https://doi.org/10.1609/aaai.v39i28.35249
Issue
Section
AAAI Student Abstract and Poster Program