Detecting Topic Drift with Compound Topic Models
Keywords:Topic models, Topic tracking, Topic drift, Trend identification and tracking
The Latent Dirichlet Allocation topic model of Blei, Ng, and Jordan (2003) is well-established as an effective approach to recovering meaningful topics of conversation from a set of documents. However, a useful analysis of user-generated content is concerned not only with the recovery of topics from a static data set, but with the evolution of topics over time. We employ a compound topic model (CTM) to track topics across two distinct data sets (i.e. past and present) and to visualize trends in topics over time; we evaluate several metrics for detecting a change in the distribution of topics within a time-window; and we illustrate how our approach discovers emerging conversation topics related to current events in real data sets.