CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations

Tanushree Mitra; Eric Gilbert

doi:10.1609/icwsm.v9i1.14625

Authors

Tanushree Mitra Georgia Institute of Technology
Eric Gilbert Georgia Institute of Technology

DOI:

https://doi.org/10.1609/icwsm.v9i1.14625

Keywords:

Social Media Credibility, Corpus creation, Micro labor annotations

Abstract

Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators — within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.

CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information