VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter

Authors

  • Anton Abilov Cornell Tech Cornell University
  • Yiqing Hua Cornell Tech Cornell University
  • Hana Matatov Technion
  • Ofra Amir Technion
  • Mor Naaman Cornell Tech Cornell University

DOI:

https://doi.org/10.1609/icwsm.v15i1.18113

Keywords:

Qualitative and quantitative studies of social media, Social network analysis; communities identification; expertise and authority discovery

Abstract

The wide spread of unfounded election fraud claims surrounding the U.S. 2020 election had resulted in undermining of trust in the election, culminating in violence inside the U.S. capitol. Under these circumstances, it is critical to understand the discussions surrounding these claims on Twitter, a major platform where the claims were disseminated. To this end, we collected and released the VoterFraud2020 dataset, a multi-modal dataset with 7.6M tweets and 25.6M retweets from 2.6M users related to voter fraud claims. To make this data immediately useful for a diverse set of research projects, we further enhance the data with cluster labels computed from the retweet graph, each user's suspension status, and the perceptual hashes of tweeted images. The dataset also includes aggregate data for all external links and YouTube videos that appear in the tweets. Preliminary analyses of the data show that Twitter's user suspension actions mostly affected a specific community of voter fraud claim promoters, and exposes the most common URLs, images and YouTube videos shared in the data.

Downloads

Published

2021-05-22

How to Cite

Abilov, A., Hua, Y., Matatov, H., Amir, O., & Naaman, M. (2021). VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 15(1), 901-912. https://doi.org/10.1609/icwsm.v15i1.18113