Population Bias in Geotagged Tweets

Momin Malik; Hemank Lamba; Constantine Nakos; Jürgen Pfeffer

doi:10.1609/icwsm.v9i4.14688

Authors

Momin Malik Carnegie Mellon University
Hemank Lamba Carnegie Mellon University
Constantine Nakos Carnegie Mellon University
Jürgen Pfeffer Carnegie Mellon University

DOI:

https://doi.org/10.1609/icwsm.v9i4.14688

Keywords:

Twitter, bias, representative, geotagged, geocoded, spatial errors

Abstract

Geotagged tweets are an exciting and increasingly popular data source, but like all social media data, they potentially have biases in who are represented. Motivated by this, we investigate the question, 'are users of geotagged tweets randomly distributed over the US population'? We link approximately 144 million geotagged tweets within the US, representing 2.6m unique users, to high-resolution Census population data and carry out a statistical test by which we answer this question strongly in the negative. We utilize spatial models and integrate further Census data to investigate the factors associated with this nonrandom distribution. We find that, controlling for other factors, population has no effect on the number of geotag users, and instead it is predicted by a number of factors including higher median income, being in an urban area, being further east or on a coast, having more young people, and having high Asian, Black or Hispanic/Latino populations.

Population Bias in Geotagged Tweets

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information