BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos

Giovanni Santia; Jake Williams

doi:10.1609/icwsm.v12i1.14985

Authors

Giovanni Santia Drexel University
Jake Williams Drexel University

DOI:

https://doi.org/10.1609/icwsm.v12i1.14985

Keywords:

Facebook, news veracity, Disqus, social bots

Abstract

Veracity assessment of news and social bot detection have become two of the most pressing issues for social media platforms, yet current gold-standard data are limited. This paper presents a leap forward in the development of a sizeable and feature rich gold-standard dataset. The dataset was built by using a collection of news items posted to Facebook by nine news outlets during September 2016, which were annotated for veracity by BuzzFeed. These articles were refined beyond binary annotation to the four categories: mostly true, mostly false, mixture of true and false, and no factual content. Our contribution integrates data on Facebook comments and reactions publicly available on the platform’s Graph API, and provides tailored tools for accessing news article web content. The features of the accessed articles include body text, images, links, Facebook plugin comments, Disqus plugin comments, and embedded tweets. Embedded tweets provide a potent possible avenue for expansion across social media platforms. Upon development, this utility yielded over 1.6 million text items, making it over 400 times larger than the current gold-standard. The resulting dataset—BuzzFace—is presently the most extensive created, and allows for more robust machine learning applications to news veracity assessment and social bot detection than ever before.

BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information