Tracking Disaster Footprints with Social Streaming Data
Social media has become an indispensable tool in the face of natural disasters due to its broad appeal and ability to quickly disseminate information. For instance, Twitter is an important source for disaster responders to search for (1) topics that have been identified as being of particular interest over time, i.e., common topics such as “disaster rescue”; (2) new emerging themes of disaster-related discussions that are fast gathering in social media streams (Saha and Sindhwani 2012), i.e., distinct topics such as “the latest tsunami destruction”. To understand the status quo and allocate limited resources to most urgent areas, emergency managers need to quickly sift through relevant topics generated over time and investigate their commonness and distinctiveness. A major obstacle to the effective usage of social media, however, is its massive amount of noisy and undesired data. Hence, a naive method, such as set intersection/difference to find common/distinct topics, is often not practical. To address this challenge, this paper studies a new topic tracking problem that seeks to effectively identify the common and distinct topics with social streaming data. The problem is important as it presents a promising new way to efficiently search for accurate information during emergency response. This is achieved by an online Nonnegative Matrix Factorization (NMF) scheme that conducts a faster update of latent factors, and a joint NMF technique that seeks the balance between the reconstruction error of topic identification and the losses induced by discovering common and distinct topics. Extensive experimental results on real-world datasets collected during Hurricane Harvey and Florence reveal the effectiveness of our framework.