UEQMS: UMAP Embedded Quick Mean Shift Algorithm for High Dimensional Clustering

Authors

  • Abhishek Kumar TCG Crest Kyungpook National University
  • Swagatam Das Indian Statistical Institute
  • Rammohan Mallipeddi Kyungpook national University

DOI:

https://doi.org/10.1609/aaai.v37i7.26011

Keywords:

ML: Clustering, ML: Unsupervised & Self-Supervised Learning

Abstract

The mean shift algorithm is a simple yet very effective clustering method widely used for image and video segmentation as well as other exploratory data analysis applications. Recently, a new algorithm called MeanShift++ (MS++) for low-dimensional clustering was proposed with a speedup of 4000 times over the vanilla mean shift. In this work, starting with a first-of-its-kind theoretical analysis of MS++, we extend its reach to high-dimensional data clustering by integrating the Uniform Manifold Approximation and Projection (UMAP) based dimensionality reduction in the same framework. Analytically, we show that MS++ can indeed converge to a non-critical point. Subsequently, we suggest modifications to MS++ to improve its convergence characteristics. In addition, we propose a way to further speed up MS++ by avoiding the execution of the MS++ iterations for every data point. By incorporating UMAP with modified MS++, we design a faster algorithm, named UMAP embedded quick mean shift (UEQMS), for partitioning data with a relatively large number of recorded features. Through extensive experiments, we showcase the efficacy of UEQMS over other state-of-the-art algorithms in terms of accuracy and runtime.

Downloads

Published

2023-06-26

How to Cite

Kumar, A., Das, S., & Mallipeddi, R. (2023). UEQMS: UMAP Embedded Quick Mean Shift Algorithm for High Dimensional Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8386-8395. https://doi.org/10.1609/aaai.v37i7.26011

Issue

Section

AAAI Technical Track on Machine Learning II