Modeling Evolution of Topics in Large-Scale Temporal Text Corpora

Authors

  • Elaheh Momeni University of Vienna
  • Shanika Karunasekera University of Melbourne
  • Palash Goyal University of Southern California
  • Kristina Lerman University of Southern California

DOI:

https://doi.org/10.1609/icwsm.v12i1.15068

Keywords:

Topics Evolution, Temporal Text Corpora, Dynamic Network, Word Embedding

Abstract

Large text temporal collections provide insights into social and cultural change over time. To quantify changes in topics in these corpora, embedding methods have been used as a diachronic tool. However, they have limited utility for modeling changes in topics due to the stochastic nature of training. We propose a new computational approach for tracking and detecting temporal evolution of topics in a large collection of texts. This approach for identifying dynamic topics and modeling their evolution combines the advantages of two methods: (1) word embeddings to learn contextual semantic representation of words from temporal snapshots of the data and (2) dynamic network analysis to identify dynamic topics by using dynamic semantic similarity networks developed using embedding models. Experimenting with two large temporal data sets from the legal and real estate domains, we show that this approach performs faster (due to parallelizing different snapshots), uncovers more coherent topics (compared to available dynamic topic modeling approaches), and effectively enables modeling evolution leveraging the network structure.

Downloads

Published

2018-06-15

How to Cite

Momeni, E., Karunasekera, S., Goyal, P., & Lerman, K. (2018). Modeling Evolution of Topics in Large-Scale Temporal Text Corpora. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). https://doi.org/10.1609/icwsm.v12i1.15068