Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape

Authors

  • Benjamin Horne Rensselaer Polytechnic Institute
  • Sara Khedr Rensselaer Polytechnic Institute
  • Sibel Adali Rensselaer Polytechnic Institute

DOI:

https://doi.org/10.1609/icwsm.v12i1.14982

Keywords:

news, dataset, journalism, fake news

Abstract

The complexity and diversity of today's media landscape provides many challenges for researchers studying news producers.These producers use many different strategies to get theirmessage believed by readers through thewriting styles they employ, by repetition across different media sources with or without attribution, as well as other mechanismsthat are yet to be studied deeply. To better facilitate systematic studies in this area, we present a large political news data set, containing over 136K news articles, from 92 news sources, collected over 7 months of 2017. These news sources are carefully chosen to include well-established and mainstream sources, maliciously fake sources, satire sources, and hyper-partisan political blogs. In addition to each article we compute 130 content-based and social media engagement features drawn from a wide range of literature on political bias, persuasion, and misinformation. With the release of the data set, we also provide the source code for feature computation. In this paper, we discuss the first release of the data set and demonstrate 4 use cases of the data and features: news characterization, engagement characterization, news attribution and content copying, and discovering news narratives.

Downloads

Published

2018-06-15

How to Cite

Horne, B., Khedr, S., & Adali, S. (2018). Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). https://doi.org/10.1609/icwsm.v12i1.14982