Hybrid Browser / Server Collection of Streaming Social Media Data for Scalable Real-Time Analysis

Authors

  • Lance Vick Tawlk
  • Titus Soporan Tawlk
  • Daniel Lewis Levelset Labs
  • Jane Zurn Virginia Commonwealth University

DOI:

https://doi.org/10.1609/icwsm.v6i3.14353

Keywords:

hyve, synt, kral, sockjs, websocket, ajax, JSON, JSON-P, twitter, digg, facebook, reddit, github, google, google , jquery, jquery-livestream, tawlk, real-time, social media, haproxy, redis, javascript, python, open source, data mining

Abstract

We present a novel approach to collecting and distributing social media data in web service projects using both clients and servers for real-time analysis, ultimately providing an inexpensive and scalable method of a quality that has not been available to date. Current challenges to social data mining include vendor enforced API limits and infrastructure costs. Our hybrid client / server approach allows data to be collected via JavaScript in browsers as well as by servers. This allows applications to compute a wide range of data analytics. We present pure client and server based collection strategies, then demonstrate how our method has substantial advantages over both. Specific advantages include lower infrastructure requirements and greater efficiency in API utilization. Our approach distributes the majority of data collection tasks to client web browsers while using servers to supply more complex analysis techniques. In addition, we provide details on two open source tools we have released to facilitate implementation by researchers in their own projects. We close by detailing a use case scenario describing a large scale public web service project followed by a solution accomplished using our approach and open source tools.

Downloads

Published

2021-08-03

How to Cite

Vick, L., Soporan, T., Lewis, D., & Zurn, J. (2021). Hybrid Browser / Server Collection of Streaming Social Media Data for Scalable Real-Time Analysis. Proceedings of the International AAAI Conference on Web and Social Media, 6(3), 29-33. https://doi.org/10.1609/icwsm.v6i3.14353