The Hybrid Nested/Hierarchical Dirichlet Process and its Application to Topic Modeling with Word Differentiation

Authors

  • Tengfei Ma The University of Tokyo
  • Issei Sato The University of Tokyo
  • Hiroshi Nakagawa The University of Tokyo

DOI:

https://doi.org/10.1609/aaai.v29i1.9591

Keywords:

nested, hierarchical, Dirichlet process

Abstract

The hierarchical Dirichlet process (HDP) is a powerful nonparametric Bayesian approach to modeling groups of data which allows the mixture components in each group to be shared. However, in many cases the groups themselves are also in latent groups (categories) which may impact the modeling a lot. In order to utilize the unknown category information of grouped data, we present the hybrid nested/ hierarchical Dirichlet process (hNHDP), a prior that blends the desirable aspects of both the HDP and the nested Dirichlet Process (NDP). Specifically, we introduce a clustering structure for the groups. The prior distribution for each cluster is a realization of a Dirichlet process. Moreover, the set of cluster-specific distributions can share part of atoms between groups, and the shared atoms and specific atoms are generated separately. We apply the hNHDP to document modeling and bring in a mechanism to identify discriminative words and topics. We derive an efficient Markov chain Monte Carlo scheme for posterior inference and present experiments on document modeling.

Downloads

Published

2015-02-21

How to Cite

Ma, T., Sato, I., & Nakagawa, H. (2015). The Hybrid Nested/Hierarchical Dirichlet Process and its Application to Topic Modeling with Word Differentiation. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9591

Issue

Section

Main Track: Novel Machine Learning Algorithms