NewsDB: An Automated Approach to Build an Extensive Database of Self-Proclaimed News Providers

Authors

  • Salim Chouaki CNRS INRIA Ecole Polytechnique Institut Polytechnique Paris
  • Minh-Kha Nguyen Université Grenoble Alpes CNRS INRIA Grenoble INP
  • Laura Edelson Northeastern University
  • Oana Goga CNRS INRIA Ecole Polytechnique Institut Polytechnique Paris
  • Tobias Lauinger New York University
  • Damon McCoy New York University

DOI:

https://doi.org/10.1609/icwsm.v20i1.42653

Abstract

The credibility of news obtained online has become a concern due to the ease with which individuals or groups can claim to be news publishers and share news-related content. Unfortunately, research on monitoring misleading information in the online news ecosystem is hindered because the community lacks a comprehensive and up-to-date list of social media pages and domains claiming to be news media. This paper employs an automated approach that uses Google's GNews API and Meta's CrowdTangle API to identify self-proclaimed news providers. Our method was able to discover 19k self-proclaimed news providers in the United States active in June 2022 and 23k active in October 2020. Additionally, we retrieve the posting history (totaling 191,182,320 posts) of discovered pages. Among others, our analysis reveals that, on average, 300 new self-proclaimed news pages are created every four months, 56% of them do not declare a managing organization, 15% of the identified news pages are news aggregators, and 57% declare to be local news.

Downloads

Published

2026-05-25

How to Cite

Chouaki, S., Nguyen, M.-K., Edelson, L., Goga, O., Lauinger, T., & McCoy, D. (2026). NewsDB: An Automated Approach to Build an Extensive Database of Self-Proclaimed News Providers. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 548–562. https://doi.org/10.1609/icwsm.v20i1.42653