DarkBench+: An Extended Benchmark for Evaluating Dark Patterns in Large Language Models
DOI:
https://doi.org/10.1609/aaai.v40i44.41103Abstract
With the widespread deployment of large language models (LLMs) in human-computer interaction, dark patterns have extended from traditional visual interfaces to conversational AI systems. While existing research has confirmed the prevalence of dark patterns in LLMs, current evaluation benchmarks face critical challenges including limited classification coverage, overlooked risks specific to reasoning models, and inadequate consideration of cross-linguistic differences. To address these limitations, we propose DarkBench+, an extended benchmark for evaluating dark patterns in LLMs. We construct an expanded taxonomy containing 10 major categories and 24 subcategories, introduce an annotation workflow combining manual and automated methods, and design 2,088 bilingual test samples in Chinese and English. This benchmark is the first to develop specialized evaluation dimensions for reasoning models and systematically evaluates dark pattern behaviors across nearly 40 mainstream LLMs. Experimental results demonstrate significant manipulation risks in reasoning models' transparency displays, while cross-linguistic evaluation analyzes AI manipulation behavior differences across different linguistic environments, promoting more ethical and responsible LLM development.Downloads
Published
2026-03-14
How to Cite
Liu, Y., Jing, S., Wei, Y., Zhang, S., Zhang, J., Mei, Z., … Zhang, P. (2026). DarkBench+: An Extended Benchmark for Evaluating Dark Patterns in Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37682–37691. https://doi.org/10.1609/aaai.v40i44.41103
Issue
Section
AAAI Special Track on AI Alignment