HINENI: Human Identity across the Nations of the Earth Ngram Investigator
DOI:
https://doi.org/10.1609/icwsm.v18i1.31331Abstract
Self-reported biographical strings on social media profiles provide a powerful tool to study self-identity. We present HINENI, a dataset of 420 million Twitter user profiles collected over a 12 year period, partitioned into 32 distinct national cohorts, which we believe is the largest publicly available data resource for identity research. We report on the major design decisions underlying HINENI, including a new notion of sampling (k-persistence) which spans the divide between traditional cross-sectional and longitudinal approaches. We demonstrate the power of HINENI to study the relative survival rate (half-life) of different tokens, and the use of emoji analysis across national cohorts to study the effects of gender, national, and sports identities.Downloads
Published
2024-05-28
How to Cite
Handzlik, D., Jones, J. J., & Skiena, S. S. (2024). HINENI: Human Identity across the Nations of the Earth Ngram Investigator. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 515–527. https://doi.org/10.1609/icwsm.v18i1.31331
Issue
Section
Full Papers