Diagnosing and Improving Topic Models by Analyzing Posterior Variability

Linzi Xing; Michael Paul

doi:10.1609/aaai.v32i1.12033

Authors

Linzi Xing University of Colorado, Boulder
Michael Paul University of Colorado, Boulder

DOI:

https://doi.org/10.1609/aaai.v32i1.12033

Abstract

Bayesian inference methods for probabilistic topic models can quantify uncertainty in the parameters, which has primarily been used to increase the robustness of parameter estimates. In this work, we explore other rich information that can be obtained by analyzing the posterior distributions in topic models. Experimenting with latent Dirichlet allocation on two datasets, we propose ideas incorporating information about the posterior distributions at the topic level and at the word level. At the topic level, we propose a metric called topic stability that measures the variability of the topic parameters under the posterior. We show that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models. At the word level, we experiment with different methods for adjusting individual word probabilities within topics based on their uncertainty. Humans prefer words ranked by our adjusted estimates nearly twice as often when compared to the traditional approach. Finally, we describe how the ideas presented in this work could potentially applied to other predictive or exploratory models in future work.

Diagnosing and Improving Topic Models by Analyzing Posterior Variability

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information