Media Cloud 2.0: An Updated Open Web News Archive
DOI:
https://doi.org/10.1609/icwsm.v20i1.42778Abstract
We present a completely re-engineered Media Cloud, a massive searchable open source archive of digital news sources and content from around the globe. Since its previous presentation at ICWSM in 2021, the Media Cloud team has re-engineered the tool's data collection, storage, and retrieval systems, built a new front-end research interface, surpassed 1.8 billion stories, and reprocessed all the content to update the extracted metadata with consistent and modern techniques. In this paper we document the new system’s engineering, characterize the datasets to date, and describe user-facing tools. This includes a Directory of online news sources and a searchable Story Index of global news stories. We discuss the utility of the datasets, how they compare to other related work, challenges associated with maintaining open research infrastructure, and research made possible through the datasets and tooling.Downloads
Published
2026-05-25
How to Cite
Bermejo, F., Bhargava, R., Budne, P., Gulley, P., Leon, E., McGrady, R., … Zuckerman, E. (2026). Media Cloud 2.0: An Updated Open Web News Archive. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2735–2746. https://doi.org/10.1609/icwsm.v20i1.42778
Issue
Section
Dataset Papers