SCoUT: A Framework for Structured Stereotype Analysis in Language Models

Jinxuan Wu; Bin Li; Xiangyang Xue

doi:10.1609/aaai.v40i32.39905

Authors

Jinxuan Wu Fudan University
Bin Li Fudan University
Xiangyang Xue Fudan University

DOI:

https://doi.org/10.1609/aaai.v40i32.39905

Abstract

Existing stereotype auditing methods for Large Language Models (LLMs) typically rely on isolated rating schemes or task-specific probes, lacking theoretical grounding and failing to reveal internal organization beyond surface-level output patterns. In this paper, we introduce SCoUT (Stereotype Content-oriented Utility structure via Thurstonian modeling), a closed-loop framework that structurally models, explicitly probes, and functionally steers stereotype dimensions (warmth and competence) in LLMs. SCoUT first reconstructs a global stereotype utility structure aligned with Stereotype Content Model theory via Thurstonian comparative judgments. Across multiple open-source LLMs, this modeling achieves high pairwise-preference prediction accuracy (≥ 0.90 on larger-scale models) and exhibits strong cross-model consistency. Probing internal attention mechanisms localizes this structure to specific heads (Spearman’s ρ up to 0.83 for warmth and 0.90 for competence) and surfaces a salient asymmetry between warmth and competence. Further, targeted inference-time activation modifications on these dimension-sensitive heads consistently steer model outputs along the intended axes. By bridging behavioral measurement with internal representation and controllable steering, SCoUT offers an end-to-end framework that uncovers and interprets the latent structure of stereotypes, advancing stereotype auditing from surface detection to structural analysis.

SCoUT: A Framework for Structured Stereotype Analysis in Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information