CCD-Bench: Probing Cultural Conflict in Large Language Model Decision-Making

Authors

  • Hasibur Rahman Center of Interdisciplinary Data Science & AI (CIDSAI), NYUAD Research Institute, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
  • Hanan Salam Center of Interdisciplinary Data Science & AI (CIDSAI), NYUAD Research Institute, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates

DOI:

https://doi.org/10.1609/aaai.v40i46.41260

Abstract

Large language models (LLMs) increasingly shape interpersonal and societal decision-making, yet their ability to navigate explicit conflicts between legitimate cultural values remains underexplored. Existing benchmarks focus on cultural knowledge (CulturalBench), value inference (WorldValuesBench), or single-axis bias (CDEval), but none assess how LLMs adjudicate when multiple cultural frameworks directly clash. We introduce CCD-Bench (Culture-Conflict Decision Benchmark), a benchmark for evaluating LLM decision-making under cross-cultural value conflict. CCD-Bench contains 2,182 open-ended dilemmas across seven domains, each with ten anonymized response options aligned with the ten GLOBE cultural clusters spanning 62 societies. Using a Stratified Latin Square design, we evaluate 17 leading LLMs and find clear biases: models favor Nordic Europe (20.2%) and Germanic Europe (12.4%), while Eastern Europe and Middle East & North Africa responses are least preferred (≈5–6%). Although 87.9% of model rationales reference multiple cultural dimensions, this pluralism is shallow, dominated by Future and Performance Orientation, with limited attention to Assertiveness or Gender Egalitarianism (<3%). Ordering effects are negligible, and model similarity clusters by developer lineage rather than geography. CCD-Bench shifts evaluation from bias detection to pluralistic reasoning, revealing that current LLMs express a Western-centric, consensus-oriented worldview even when confronted with equally valid, culturally diverse alternatives.

Downloads

Published

2026-03-14

How to Cite

Rahman, H., & Salam, H. (2026). CCD-Bench: Probing Cultural Conflict in Large Language Model Decision-Making. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39125–39133. https://doi.org/10.1609/aaai.v40i46.41260