SCoUT: A Framework for Structured Stereotype Analysis in Language Models

Authors

  • Jinxuan Wu Fudan University
  • Bin Li Fudan University
  • Xiangyang Xue Fudan University

DOI:

https://doi.org/10.1609/aaai.v40i32.39905

Abstract

Existing stereotype auditing methods for Large Language Models (LLMs) typically rely on isolated rating schemes or task-specific probes, lacking theoretical grounding and failing to reveal internal organization beyond surface-level output patterns. In this paper, we introduce SCoUT (Stereotype Content-oriented Utility structure via Thurstonian modeling), a closed-loop framework that structurally models, explicitly probes, and functionally steers stereotype dimensions (warmth and competence) in LLMs. SCoUT first reconstructs a global stereotype utility structure aligned with Stereotype Content Model theory via Thurstonian comparative judgments. Across multiple open-source LLMs, this modeling achieves high pairwise-preference prediction accuracy (≥ 0.90 on larger-scale models) and exhibits strong cross-model consistency. Probing internal attention mechanisms localizes this structure to specific heads (Spearman’s ρ up to 0.83 for warmth and 0.90 for competence) and surfaces a salient asymmetry between warmth and competence. Further, targeted inference-time activation modifications on these dimension-sensitive heads consistently steer model outputs along the intended axes. By bridging behavioral measurement with internal representation and controllable steering, SCoUT offers an end-to-end framework that uncovers and interprets the latent structure of stereotypes, advancing stereotype auditing from surface detection to structural analysis.

Published

2026-03-14

How to Cite

Wu, J., Li, B., & Xue, X. (2026). SCoUT: A Framework for Structured Stereotype Analysis in Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 26931-26938. https://doi.org/10.1609/aaai.v40i32.39905

Issue

Section

AAAI Technical Track on Machine Learning IX