Online News Coverage of Critical Race Theory Controversies: A Dataset of Annotated Headlines

Anna Lieb; Maneesh Arora; Eni Mustafaraj

doi:10.1609/icwsm.v18i1.31441

Authors

Anna Lieb Wellesley College, MA, USA
Maneesh Arora Wellesley College, MA, USA
Eni Mustafaraj Wellesley College, MA, USA

DOI:

https://doi.org/10.1609/icwsm.v18i1.31441

Abstract

In this paper, we introduce an annotated dataset of 11,704 unique U.S. news headlines related to critical race theory and its controversies from August 2020 through December 2022. Annotations generated by GPT-4 specify the headline stance and the primary actor in the headline. GPT-4 annotations performed well on the validation dataset, with weighted average F-scores of 0.8339 for headline stance annotations and 0.7625 for primary actor annotations. Along with the annotated headlines and URLs to the full article, we augment the dataset with metrics that are relevant to future research on political polarization, news frame analysis, and regional news coverage. The dataset includes partisan audience bias scores by news source domain, tags for mentions of U.S. states in the article body, and exposure and engagement metrics for articles shared on Reddit. Among other preliminary descriptive analyses, we find that the most prevalent headline stance in our headlines dataset is anti-CRT (43.06%), and the most prevalent primary actor in our headlines dataset is political influencers (56.56%). This paper describes the data collection methodology, preliminary descriptive analysis, and possible uses of the dataset for future research in political science, computational social sciences, and natural language processing. Our dataset and replication code is available to access on Zenodo at zenodo.org/doi/10.5281/zenodo.10516190

Online News Coverage of Critical Race Theory Controversies: A Dataset of Annotated Headlines

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information