A Co-Training Model with Label Propagation on a Bipartite Graph to Identify Online Users with Disabilities
Collecting data from representative users with disabilities for accessibility research is time and resource consuming. With the proliferation of social media websites, many online spaces have emerged for people with disabilities. The information accumulated in such places is of great value for data collection and participant recruiting. However, there are also many active non-representative users in such online spaces such as medical practitioners, caretakers, or family members. In this work, we introduce a novel co-training model based on the homophily phenomenon observed among online users with the same disability. The model combines a variational label propagation algorithm and a naive Bayes classifier to identify online users who have the same disability. We evaluated this model on a dataset collected from Reddit and the results show improvements over traditional models.