Transfer Latent Semantic Learning: Microblog Mining with Less Supervision

Authors

  • Dan Zhang Purdue University
  • Yan Liu University of Southern California
  • Richard Lawrence IBM T. J. Watson Research Center
  • Vijil Chenthamarakshan IBM T. J. Watson Research Center

DOI:

https://doi.org/10.1609/aaai.v25i1.7916

Abstract

The increasing volume of information generated on micro-blogging sites such as Twitter raises several challenges to traditional text mining techniques. First, most texts from those sites are abbreviated due to the constraints of limited characters in one post; second, the input usually comes in streams of large-volumes. Therefore, it is of significant importance to develop effective and efficient representations of abbreviated texts for better filtering and mining. In this paper, we introduce a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of related tagged documents with rich information from other sources (source domain) to help build a robust latent semantic space for the abbreviated texts (target domain). This is achieved by simultaneously minimizing the document reconstruction error and the classification error of the labeled examples from the source domain by building a classifier with hinge loss in the latent semantic space. We demonstrate the effectiveness of our method by applying them to the task of classifying and tagging abbreviated texts. Experimental results on both synthetic datasets and real application datasets, including Reuters-21578 and Twitter data, suggest substantial improvements using our approach over existing ones.

Downloads

Published

2011-08-04

How to Cite

Zhang, D., Liu, Y., Lawrence, R., & Chenthamarakshan, V. (2011). Transfer Latent Semantic Learning: Microblog Mining with Less Supervision. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 561-566. https://doi.org/10.1609/aaai.v25i1.7916

Issue

Section

AAAI Technical Track: Machine Learning