Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Avi Matza; Yuval Bistritz

doi:10.1609/aaai.v35i10.17079

Authors

Avi Matza Tel-Aviv University
Yuval Bistritz Tel-Aviv University

DOI:

https://doi.org/10.1609/aaai.v35i10.17079

Keywords:

Bayesian Learning

Abstract

Infinite Gaussian mixture modeling (IGMM) is a modeling method that determines all the parameters of a Gaussian mixture model (GMM), including its order. It has been well documented that it is a consistent estimator for probability density functions in the sense that, given enough training data from sufficiently regular probability density functions, it will converge to the shape of the original density curve. It is also known, however, that IGMM provides an inconsistent estimation of the number of clusters. The current paper shows that the nature of this inconsistency is an overestimation, and we pinpoint that this problem is an inherent part of the training algorithm. It stems mostly from a "self-reinforcing feedback'' which is a certain relation between the likelihood function of one of the model hyperparameters (alpha) and the probability of sampling the number of components, that sustain their mutual growth during the Gibbs iterations. We show that this problem can be resolved by using informative priors for alpha and propose a modified training procedure that uses the inverse chi-square for this purpose. The modified algorithm successfully recovers the ``known" order in all the experiments with synthetic data sets. It also demonstrates good results when compared to other methods used to evaluate model order, using real-world databases. Furthermore, the improved performance is attained without undermining the fidelity of estimating the original PDFs and with a significant reduction in computational cost.

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription