Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice
Keywords:twitter, geoinference, geolocation, spatial location, location, social media, comparative anaysis
Geolocated social media data provides a powerful source of information about place and regional human behavior. Because little social media data is geolocation-annotated, inference techniques serve an essential role for increasing the volume of annotated data. One major class of inference approaches has relied on the social network of Twitter, where the locations of a user's friends serve as evidence for that user's location. While many such inference techniques have been recently proposed, we actually know little about their relative performance, with the amount of ground truth data varying between 5% and 100% of the network, the size of the social network varying by four orders of magnitude, and little standardization in evaluation metrics. We conduct a systematic comparative analysis of nine state-of-the-art network-based methods for performing geolocation inference at the global scale, controlling for the source of ground truth data, dataset size, and temporal recency in test data. Furthermore, we identify a comprehensive set of evaluation metrics that clarify performance differences. Our analysis identifies a large performance disparity between that reported in the literature and that seen in real-world conditions. To aid reproducibility and future comparison, all implementations have been released in an open source geoinference package.