Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 700,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos) with an aim to analyze the possible role of AI in helping a marginalized community. Using a novel combination of multiple Active Learning strategies and a novel active sampling strategy based on nearest-neighbors in the comment-embedding space, we construct a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones. We advocate that beyond the burgeoning field of hate speech detection, automatic detection of help speech can lend voice to the voiceless people and make the internet safer for marginalized communities.