Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs

Ramesh Nallapati; William Cohen

doi:10.1609/icwsm.v2i1.18621

Authors

Ramesh Nallapati Carnegie Mellon University
William Cohen Carnegie Mellon University

DOI:

https://doi.org/10.1609/icwsm.v2i1.18621

Abstract

In this work, we address the twin problems of unsupervised topic discovery and estimation of topic specific influence of blogs. We propose a new model that can be used to provide a user with highly influential blog postings on the topic of the user's interest. We adopt the framework of an unsupervised model called Latent Dirichlet Allocation, known for its effectiveness in topic discovery. An extension of this model, which we call Link-LDA, defines a generative model for hyperlinks and thereby models topic specific influence of documents, the problem of our interest. However, this model does not exploit the topical relationship between the documents on either side of a hyperlink, i.e., the notion that documents tend to link to other documents on the same topic. We propose a new model, called Link-PLSA-LDA, that combines PLSA and LDA into a single framework, and explicitly models the topical relationship between the linking and the linked document. The output of the new model on blog data reveals very interesting visualizations of topics and influential blogs on each topic. We also perform quantitative evaluation of the model using log-likelihood of unseen data and on the task of link prediction. Both experiments show that that the new model performs better, suggesting its superiority over Link-LDA in modeling topics and topic specific influence of blogs.

Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information