Twitter Corpus of the #BlackLivesMatter Movement and Counter Protests: 2013 to 2021


  • Salvatore Giorgi University of Pennsylvania National Institute on Drug Abuse
  • Sharath Chandra Guntuku University of Pennsylvania
  • Mckenzie Himelein-Wachowiak National Institute on Drug Abuse
  • Amy Kwarteng National Institute on Drug Abuse
  • Sy Hwang University of Pennsylvania
  • Muhammad Rahman National Institute on Drug Abuse
  • Brenda Curtis National Institute on Drug Abuse



Web and Social Media


Black Lives Matter (BLM) is a decentralized social movement protesting violence against Black individuals and communities, with a focus on police brutality. The movement gained significant attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd in 2020. The #BlackLivesMatter social media hashtag has come to represent the grassroots movement, with similar hashtags counter protesting the BLM movement, such as #AllLivesMatter, and #BlueLivesMatter. We introduce a data set of 63.9 million tweets from 13.0 million users from over 100 countries which contain one of the following keywords: BlackLivesMatter, AllLivesMatter, and BlueLivesMatter. This data set contains all currently available tweets from the beginning of the BLM movement in 2013 to 2021. We summarize the data set and show temporal trends in use of both the BlackLivesMatter keyword and keywords associated with counter movements. Additionally, for each keyword, we create and release a set of Latent Dirichlet Allocation (LDA) topics (i.e., automatically clustered groups of semantically co-occuring words) to aid researchers in identifying linguistic patterns across the three keywords.




How to Cite

Giorgi, S., Guntuku, S. C., Himelein-Wachowiak, M., Kwarteng, A., Hwang, S., Rahman, M., & Curtis, B. (2022). Twitter Corpus of the #BlackLivesMatter Movement and Counter Protests: 2013 to 2021. Proceedings of the International AAAI Conference on Web and Social Media, 16(1), 1228-1235.