Distant-Supervision of Heterogeneous Multitask Learning for Social Event Forecasting With Multilingual Indicators
Keywords:multi-task learning, multi-instance learning, spatial event forecasting
Open-source indicators such as social media can be very effective precursors for forecasting future societal events. As events are often preceded by social indicators generated by groups of people speaking many different languages, multiple languages need to be considered to ensure comprehensive event forecasting. However, this leads to several technical challenges for traditional models: 1) high dimension, sparsity, and redundancy of features; 2) translation correlation among the multilingual features. and 3) lack of language-wise supervision. In order to simultaneously address these issues, we present a novel model capable of distant-supervision of heterogeneous multitask learning (DHML) for multilingual spatial social event forecasting. This model maps the multilingual heterogeneous features into several latent semantic spaces and then enforces a similar sparsity pattern across them all, using distant supervision across all the languages involved. Optimizing this model creates a difficult problem that is nonconvex and nonsmooth that can then be decomposed into simpler subproblems using the Alternative Direction Multiplier of Methods (ADMM). A novel dynamic programming-based algorithm is proposed to solve one challenging subproblem efficiently. Theoretical properties of the proposed algorithm are analyzed. The results of extensive experiments on multiple real-world datasets are presented to demonstrate the effectiveness, efficiency, and interpretability of the proposed approach.