Online News Coverage of Critical Race Theory Controversies: A Dataset of Annotated Headlines


  • Anna Lieb Wellesley College, MA, USA
  • Maneesh Arora Wellesley College, MA, USA
  • Eni Mustafaraj Wellesley College, MA, USA



In this paper, we introduce an annotated dataset of 11,704 unique U.S. news headlines related to critical race theory and its controversies from August 2020 through December 2022. Annotations generated by GPT-4 specify the headline stance and the primary actor in the headline. GPT-4 annotations performed well on the validation dataset, with weighted average F-scores of 0.8339 for headline stance annotations and 0.7625 for primary actor annotations. Along with the annotated headlines and URLs to the full article, we augment the dataset with metrics that are relevant to future research on political polarization, news frame analysis, and regional news coverage. The dataset includes partisan audience bias scores by news source domain, tags for mentions of U.S. states in the article body, and exposure and engagement metrics for articles shared on Reddit. Among other preliminary descriptive analyses, we find that the most prevalent headline stance in our headlines dataset is anti-CRT (43.06%), and the most prevalent primary actor in our headlines dataset is political influencers (56.56%). This paper describes the data collection methodology, preliminary descriptive analysis, and possible uses of the dataset for future research in political science, computational social sciences, and natural language processing. Our dataset and replication code is available to access on Zenodo at




How to Cite

Lieb, A., Arora, M., & Mustafaraj, E. (2024). Online News Coverage of Critical Race Theory Controversies: A Dataset of Annotated Headlines. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 1979-1990.