Transfer Latent Semantic Learning: Microblog Mining with Less Supervision

Dan Zhang; Yan Liu; Richard Lawrence; Vijil Chenthamarakshan

doi:10.1609/aaai.v25i1.7916

Authors

Dan Zhang Purdue University
Yan Liu University of Southern California
Richard Lawrence IBM T. J. Watson Research Center
Vijil Chenthamarakshan IBM T. J. Watson Research Center

DOI:

https://doi.org/10.1609/aaai.v25i1.7916

Abstract

The increasing volume of information generated on micro-blogging sites such as Twitter raises several challenges to traditional text mining techniques. First, most texts from those sites are abbreviated due to the constraints of limited characters in one post; second, the input usually comes in streams of large-volumes. Therefore, it is of significant importance to develop effective and efficient representations of abbreviated texts for better filtering and mining. In this paper, we introduce a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of related tagged documents with rich information from other sources (source domain) to help build a robust latent semantic space for the abbreviated texts (target domain). This is achieved by simultaneously minimizing the document reconstruction error and the classification error of the labeled examples from the source domain by building a classifier with hinge loss in the latent semantic space. We demonstrate the effectiveness of our method by applying them to the task of classifying and tagging abbreviated texts. Experimental results on both synthetic datasets and real application datasets, including Reuters-21578 and Twitter data, suggest substantial improvements using our approach over existing ones.

Transfer Latent Semantic Learning: Microblog Mining with Less Supervision

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription