Twitter-STMHD: An Extensive User-Level Database of Multiple Mental Health Disorders

Authors

  • Suhavi Indraprastha Institute of Information and Technology, Delhi, India
  • Asmit Kumar Singh Indraprastha Institute of Information and Technology, Delhi, India
  • Udit Arora Indraprastha Institute of Information and Technology, Delhi, India
  • Somyadeep Shrivastava Indian Institute of Information Technology, Dharwad, India
  • Aryaveer Singh Guru Gobind Singh Indraprastha University, Delhi, India
  • Rajiv Ratn Shah Indraprastha Institute of Information and Technology, Delhi, India
  • Ponnurangam Kumaraguru International Institute of Information and Technology, Hyderabad, India

Keywords:

Human computer interaction; social media tools; navigation and visualization, Credibility of online content, Psychological, personality-based and ethnographic studies of social media, Social media usage on mobile devices; location, human mobility, and behavior

Abstract

Social Media is equipped with the ability to track and quantify user behavior, establishing it as an appropriate resource for mental health studies. However, previous efforts in the area have been limited by the lack of data and contextually relevant information. There is a need for large-scale, well-labeled mental health datasets with fast reproducible methods to facilitate their heuristic growth. In this paper, we cater to this need by building the Twitter - Self-Reported Temporally-Contextual Mental Health Diagnosis Dataset (Twitter-STMHD), a large scale, user-level dataset grouped into 8 disorder categories and a companion class of control users. The dataset is 60% hand-annotated, which lead to the creation of high-precision self-reported diagnosis report patterns, used for the construction of the rest of the dataset. The dataset, instead of being a corpus of tweets, is a collection of user-profiles of those suffering from mental health disorders to provide a holistic view of the problem statement. By leveraging temporal information, the data for a given profile in the dataset has been collected for disease prevalence periods: onset of disorder, diagnosis and progression, along with a fourth period: COVID-19. This is the only and the largest dataset that captures the tweeting activity of users suffering from mental health disorders during the COVID-19 period.

Downloads

Published

2022-05-31

How to Cite

, S., Singh, A. K., Arora, U., Shrivastava, S., Singh, A., Shah, R. R., & Kumaraguru, P. (2022). Twitter-STMHD: An Extensive User-Level Database of Multiple Mental Health Disorders. Proceedings of the International AAAI Conference on Web and Social Media, 16(1), 1182-1191. Retrieved from https://ojs.aaai.org/index.php/ICWSM/article/view/19368