Compressed K-Means for Large-Scale Clustering

Authors

  • Xiaobo Shen Nanjing University of Science and Technology
  • Weiwei Liu University of Technology Sydney
  • Ivor Tsang University of Technology Sydney
  • Fumin Shen University of Electronic Science and Technology of China
  • Quan-Sen Sun Nanjing University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v31i1.10852

Keywords:

Large-scale clustering, k-means, binary code

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Downloads

Published

2017-02-13

How to Cite

Shen, X., Liu, W., Tsang, I., Shen, F., & Sun, Q.-S. (2017). Compressed K-Means for Large-Scale Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10852