WhichStreams: A Dynamic Approach for Focused Data Capture from Large Social Media
Keywords:Social Media, Data Capture, Machine Learning
Due to the huge amount of data produced on large social media, capturing useful content usually implies to focus on subsets of data that fit with a pre-specified need. Considering the usual API restrictions of these media, we formulate this task of focused capture as a dynamic data sources selection problem. We then propose a machine learning methodology, named WhichStreams, which is based on an extension of a recently proposed combinatorial bandit algorithm. The evaluation of our approach on various Twitter datasets, with both offline and online settings, demonstrates the relevance of the proposal for leveraging the real-time data streaming APIs offered by most of the main social media.