All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract)

Authors

  • Arka Dutta Rochester Institute of Technology Carnegie Mellon University
  • Aman Priyanshu Carnegie Mellon University
  • Ashiqur R. KhudaBukhsh Rochester Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v39i28.35249

Abstract

This paper makes a novel combination of a recently proposed bias audit framework and a recently proposed jailbreaking technique for Llama3. On an audit comprising several disadvantaged groups, our experiments reveal that a jailbroken Llama3 exhibits worrisome antisemitism, racism, misogyny, and homophobia (to list a few) much akin to a broad suite of LLMs that were susceptible to similar biases.

Published

2025-04-11

How to Cite

Dutta, A., Priyanshu, A., & KhudaBukhsh, A. R. (2025). All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29353-29355. https://doi.org/10.1609/aaai.v39i28.35249