Scalable Social Analytics for Live Viral Event Prediction


  • Puneet Jain Duke University
  • Justin Manweiler IBM T. J. Watson Research
  • Arup Acharya IBM T. J. Watson Research
  • Romit Roy Choudhury University of Illinois, Urbana Champaign



Virality, Twitter, Youtube, Real-time processing


Large-scale, predictive social analytics have proven effective. Over the last decade, research and industrial efforts have understood the potential value of inferences based on online behavior analysis, sentiment mining, influence analysis, epidemic spread, etc. The majority of these efforts, however, are not yet designed with realtime responsiveness as a first-order requirement. Typical systems perform a post-mortem analysis on volumes of historical data and validate their “predictions” against already-occurred events.We observe that in many applications, real-time predictions are critical and delays of hours (and even minutes) can reduce their utility. As examples: political campaigns could react very quickly to a scandal spreading on Facebook; content distribution networks (CDNs) could prefetch videos that are predicted to soon go viral; online advertisement campaigns can be corrected to enhance consumer reception. This paper proposes CrowdCast, a cloud-based framework to enable real-time analysis and prediction from streaming social data. As an instantiation of this framework, we tune CrowdCast to observe Twitter tweets, and predict which YouTube videos are most likely to “go viral” in the near future. To this end, CrowdCast first applies online machine learning to map natural language tweets to a specific YouTube video. Then, tweets that indeed refer to videos are weighted by the perceived “influence” of the sender. Finally, the video’s spread is predicted through a sociological model, derived from the emerging structure of the graph over which the video-related tweets are (still) spreading. Combining metrics of influence and live structure, CrowdCast outputs sets of candidate videos, identified as likely to become viral in the next few hours. We monitor Twitter for more than 30 days, and find that CrowdCast’s real-time predictions demonstrate encouraging correlation with actual YouTube viewership in the near future.




How to Cite

Jain, P., Manweiler, J., Acharya, A., & Roy Choudhury, R. (2014). Scalable Social Analytics for Live Viral Event Prediction. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 226-235.