Detecting Divisive Language: A Concept-Grounded, LLM-Guided Pipeline for Polarizing Social Media Sphere

Yuting He; Tianhao Li; Jiebiao Wang; Yongjun Zhang

doi:10.1609/icwsm.v20i1.42796

Authors

Yuting He School of Journalism and Media, The University of Texas at Austin
Tianhao Li School of Information, The University of Texas at Austin
Jiebiao Wang Crown Family School of Social Work, Policy, and Practice, University of Chicago
Yongjun Zhang Department of Sociology, Stony Brook University

DOI:

https://doi.org/10.1609/icwsm.v20i1.42796

Abstract

Political polarization poses a growing global challenge, yet existing NLP approaches typically rely on indirect proxies such as toxicity or sentiment, which fail to capture identity-based antagonism that is central to polarizing discourse. We address this gap by conceptualizing polarization-related discourse as divisive language: language that explains political or social disagreement by attributing it to group-based identities. Building on this definition, we propose a staged training pipeline that uses large language models (LLMs) to generate definition-grounded supervision and progressively distills it into lightweight classifiers suitable for large-scale analysis. Experiments on social media data show that the resulting models substantially outperform zero-shot prompting and small-scale supervised baselines, while detecting forms ofpolarization that are not captured by toxicity- or sentiment-based methods. Our findings demonstrate that divisive language can be treated as a distinct, computable linguistic construct, enabling scalable and theoretically grounded analysis of political polarization.

Detecting Divisive Language: A Concept-Grounded, LLM-Guided Pipeline for Polarizing Social Media Sphere

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information