Compressed K-Means for Large-Scale Clustering

Xiaobo Shen; Weiwei Liu; Ivor Tsang; Fumin Shen; Quan-Sen Sun

doi:10.1609/aaai.v31i1.10852

Authors

Xiaobo Shen Nanjing University of Science and Technology
Weiwei Liu University of Technology Sydney
Ivor Tsang University of Technology Sydney
Fumin Shen University of Electronic Science and Technology of China
Quan-Sen Sun Nanjing University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v31i1.10852

Keywords:

Large-scale clustering, k-means, binary code

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Compressed K-Means for Large-Scale Clustering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information