Inferring Gender from the Content of Tweets: A Region Specific Example

Authors

  • Clay Fink The Johns Hopkins University
  • Jonathon Kopecky The Johns Hopkins University
  • Maksym Morawski The University of Maryland Baltimore County

DOI:

https://doi.org/10.1609/icwsm.v6i1.14320

Keywords:

social media, twitter, gender, demography

Abstract

There is growing interest in using social networking sites such as Twitter to gather real-time data on the reactions and opinions of a region's population, including locations in the developing world where social media has played an important role in recent events, such as the 2011 Arab Spring. However, many interesting and important opinions and reactions may differ significantly within a given region depending on the demographics of the subpopulation, including such categories as gender and ethnicity. Unfortunately, the demographic characteristics of social media users are often unknown because such categories are not always captured in user metadata. Twitter, for example, does not capture a user’s gender in their profile, and inferring gender from first names is difficult since Twitter users are not required to give their real names. There is thus a need for automated methods that can infer such hidden attributes of users from other data sources. In this paper we describe a method to infer the gender of Twitter users from only the content of their tweets. Looking at Twitter users from the West African nation of Nigeria, we applied supervised machine learning using features derived from the content of user tweets to train a classifier. Using unigram features alone, we obtained an accuracy of 80% for predicting gender, suggesting that content alone can be a good predictor of gender. An analysis of the highest weighted features shows some interesting distinctions between men and women both topically and emotionally. We argue that approaches such as the one described here can give us a clearer picture of who is utilizing social media when certain user attributes are unreliable or not available.

Downloads

Published

2021-08-03

How to Cite

Fink, C., Kopecky, J., & Morawski, M. (2021). Inferring Gender from the Content of Tweets: A Region Specific Example. Proceedings of the International AAAI Conference on Web and Social Media, 6(1), 459-462. https://doi.org/10.1609/icwsm.v6i1.14320