Scalable Sequential Spectral Clustering

Authors

  • Yeqing Li University of Texas at Arlington
  • Junzhou Huang University of Texas at Arlington
  • Wei Liu Didi Research

DOI:

https://doi.org/10.1609/aaai.v30i1.10298

Keywords:

spectral clustering, large scale, sequential k-means

Abstract

In the past decades, Spectral Clustering (SC) has become one of the most effective clustering approaches. Although it has been widely used, one significant drawback of SC is its expensive computation cost. Many efforts have been devoted to accelerating SC algorithms and promising results have been achieved. However, most of the existing algorithms rely on the assumption that data can be stored in the computer memory. When data cannot fit in the memory, these algorithms will suffer severe performance degradations. In order to overcome this issue, we propose a novel sequential SC algorithm for tackling large-scale clustering with limited computational resources, \textit{e.g.}, memory. We begin with investigating an effective way of approximating the graph affinity matrix via leveraging a bipartite graph. Then we choose a smart graph construction and optimization strategy to avoid random access to data. These efforts lead to an efficient SC algorithm whose memory usage is independent of the number of input data points. Extensive experiments carried out on large datasets demonstrate that the proposed sequential SC algorithm is up to a thousand times faster than the state-of-the-arts.

Downloads

Published

2016-02-21

How to Cite

Li, Y., Huang, J., & Liu, W. (2016). Scalable Sequential Spectral Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10298

Issue

Section

Technical Papers: Machine Learning Methods