Hierarchical Bayesian Models for Latent Attribute Detection in Social Media
We present several novel minimally-supervised models for detecting latent attributes of social media users, with a focus on ethnicity and gender. Previouswork on ethnicity detection has used coarse-grained widely separated classes of ethnicity and assumed the existence of large amounts of training data such as the US census, simplifying the problem. Instead, we examine content generated by users in addition to name morpho-phonemics to detect ethnicity and gender. Further, weaddress this problem in a challenging setting where the ethnicity classes are more fine grained -- ethnicity classes in Nigeria -- and with very limited training data.