Population Bias in Geotagged Tweets

Authors

  • Momin Malik Carnegie Mellon University
  • Hemank Lamba Carnegie Mellon University
  • Constantine Nakos Carnegie Mellon University
  • Jürgen Pfeffer Carnegie Mellon University

DOI:

https://doi.org/10.1609/icwsm.v9i4.14688

Keywords:

Twitter, bias, representative, geotagged, geocoded, spatial errors

Abstract

Geotagged tweets are an exciting and increasingly popular data source, but like all social media data, they potentially have biases in who are represented. Motivated by this, we investigate the question, 'are users of geotagged tweets randomly distributed over the US population'? We link approximately 144 million geotagged tweets within the US, representing 2.6m unique users, to high-resolution Census population data and carry out a statistical test by which we answer this question strongly in the negative. We utilize spatial models and integrate further Census data to investigate the factors associated with this nonrandom distribution. We find that, controlling for other factors, population has no effect on the number of geotag users, and instead it is predicted by a number of factors including higher median income, being in an urban area, being further east or on a coast, having more young people, and having high Asian, Black or Hispanic/Latino populations.

Downloads

Published

2021-08-03

How to Cite

Malik, M., Lamba, H., Nakos, C., & Pfeffer, J. (2021). Population Bias in Geotagged Tweets. Proceedings of the International AAAI Conference on Web and Social Media, 9(4), 18-27. https://doi.org/10.1609/icwsm.v9i4.14688