A Bootstrapping Approach to Identifying Relevant Tweets for Social TV

Ovidiu Dan; Junlan Feng; Brian Davison

doi:10.1609/icwsm.v5i1.14195

Authors

Ovidiu Dan Lehigh University
Junlan Feng AT&T Labs Research
Brian Davison Lehigh University

DOI:

https://doi.org/10.1609/icwsm.v5i1.14195

Abstract

Manufacturers of TV sets have recently started adding social media features to their products. Some of these products display microblogging messages relevant to the TV show which the user is currently watching. However, such systems suffer from low precision and recall when they use the title of the show to search for relevant messages. Titles of some popular shows such as Lost or Survivor are highly ambiguous, resulting in messages unrelated to the show. Thus, there is a need to develop filtering algorithms that can achieve both high precision and recall. Filtering microblogging messages for Social TV poses several challenges, including lack of training data, lack of proper grammar and capitalization, lack of context due to text sparsity, etc. We describe a bootstrapping algorithm which uses a small manually labeled dataset, a large dataset of unlabeled messages, and some domain knowledge to derive a high precision classifier that can successfully filter microblogging messages which discuss television shows. The classifier is designed to generalize to TV shows which were not part of the training set. The algorithm achieves high precision on our two test datasets and successfully generalizes to unseen television shows. Furthermore, it compares favorably to a text classifier specifically trained on the television shows used for testing.

A Bootstrapping Approach to Identifying Relevant Tweets for Social TV

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information