How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? An Explainability-Based Audit Framework for Implicit Bias

Arka Dutta; Reza Fayyazi; Shanchieh Yang; Ashiqur R. KhudaBukhsh

doi:10.1609/aaai.v40i45.41181

How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? An Explainability-Based Audit Framework for Implicit Bias

Authors

Arka Dutta Rochester Institute of Technology
Reza Fayyazi Rochester Institute of Technology
Shanchieh Yang Gonzaga University
Ashiqur R. KhudaBukhsh Rochester Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i45.41181

Abstract

Auditing large language models (LLMs) for biases is an ongoing and dynamic process, resembling a proverbial cat-and-mouse game. As researchers identify new vulnerabilities in LLMs, guardrails are updated to address them, prompting the need for innovative approaches to audit the increasingly fortified LLMs for biases. This paper makes three contributions. First, it introduces a scalable, explainable framework to measure biases against various identity groups across multiple open large language models. Second, it conducts a bias audit considering five well-known open LLMs and demonstrates their bias inclinations towards several historically disadvantaged groups. Our audit reveals disturbing antisemitic, Islamophobic, and xenophobic biases present in several well-known LLMs. Finally, we release a dataset of 1,000 probes curated under the supervision of an expert social scientist that can facilitate similar audits.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Dutta, A., Fayyazi, R., Yang, S., & KhudaBukhsh, A. R. (2026). How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? An Explainability-Based Audit Framework for Implicit Bias. Proceedings of the AAAI Conference on Artificial Intelligence, 40(45), 38404–38412. https://doi.org/10.1609/aaai.v40i45.41181

Download Citation

Issue

Vol. 40 No. 45: AAAI-26 Special Track AI for Social Impact I

Section

AAAI Special Track on AI for Social Impact I

How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? An Explainability-Based Audit Framework for Implicit Bias

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information