Simultaneous Clustering and Ensemble

Authors

  • Zhiqiang Tao Northeastern University
  • Hongfu Liu Northeastern University
  • Yun Fu Northeastern University

DOI:

https://doi.org/10.1609/aaai.v31i1.10720

Keywords:

Ensemble Clustering, Co-association Matrix, Spectral Clustering

Abstract

Ensemble Clustering (EC) has gained a great deal of attention throughout the fields of data mining and machine learning, since it emerged as an effective and robust clustering framework. Typically, EC methods try to fuse multiple basic partitions (BPs) into a consensus one, of which each BP is obtained by performing traditional clustering method on the same dataset. One promising direction for ensemble clustering is to derive pairwise similarity from BPs, and then transform it as a graph partition problem. However, these graph based methods may suffer from an information loss when computing the similarity between data points, because they only utilize the categorical data provided by multiple BPs, yet neglect rich information from raw features. This problem can badly undermine the underlying cluster structure in the original feature space, and thus degrade the clustering performance. In light of this, we propose a novel Simultaneous Clustering and Ensemble (SCE) framework to alleviate such detrimental effect, which employs the similarity matrix from raw features to enhance the co-association matrix summarized by multiple BPs. Two neat closed-form solutions given by eigenvalue decomposition are provided for SCE. Experiments conducted on 16 real-world datasets demonstrate the effectiveness of the proposed SCE over the traditional clustering and state-of-the-art ensemble clustering methods. Moreover, several impact factors that may affect our method are also explored extensively.

Downloads

Published

2017-02-12

How to Cite

Tao, Z., Liu, H., & Fu, Y. (2017). Simultaneous Clustering and Ensemble. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10720

Issue

Section

Main Track: Machine Learning Applications