Scalable Sequential Spectral Clustering

Yeqing Li; Junzhou Huang; Wei Liu

doi:10.1609/aaai.v30i1.10298

Authors

Yeqing Li University of Texas at Arlington
Junzhou Huang University of Texas at Arlington
Wei Liu Didi Research

DOI:

https://doi.org/10.1609/aaai.v30i1.10298

Keywords:

spectral clustering, large scale, sequential k-means

Abstract

In the past decades, Spectral Clustering (SC) has become one of the most effective clustering approaches. Although it has been widely used, one significant drawback of SC is its expensive computation cost. Many efforts have been devoted to accelerating SC algorithms and promising results have been achieved. However, most of the existing algorithms rely on the assumption that data can be stored in the computer memory. When data cannot fit in the memory, these algorithms will suffer severe performance degradations. In order to overcome this issue, we propose a novel sequential SC algorithm for tackling large-scale clustering with limited computational resources, \textit{e.g.}, memory. We begin with investigating an effective way of approximating the graph affinity matrix via leveraging a bipartite graph. Then we choose a smart graph construction and optimization strategy to avoid random access to data. These efforts lead to an efficient SC algorithm whose memory usage is independent of the number of input data points. Extensive experiments carried out on large datasets demonstrate that the proposed sequential SC algorithm is up to a thousand times faster than the state-of-the-arts.

Scalable Sequential Spectral Clustering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information