Clustering - What Both Theoreticians and Practitioners Are Doing Wrong

Shai Ben-David

doi:10.1609/aaai.v32i1.12221

Authors

Shai Ben-David University of Waterloo

DOI:

https://doi.org/10.1609/aaai.v32i1.12221

Keywords:

clustering, theory, practice, bias, challenges

Abstract

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowadays. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. The first challenge I address is model selection---how should a user pick an appropriate clustering tool for a given clustering problem, and how should the parameters of such an algorithmic tool be tuned? In contrast with other common computational tasks, for clustering, different algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool selection for a given clustering task. I argue the severity of this problem and describe some recent proposals aiming to address this crucial lacuna.

Clustering - What Both Theoreticians and Practitioners Are Doing Wrong

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information