Almost Linear Time Density Level Set Estimation via DBSCAN

Authors

  • Hossein Esfandiari Google Research
  • Vahab Mirrokni Google Research
  • Peilin Zhong Columbia University

DOI:

https://doi.org/10.1609/aaai.v35i8.16902

Keywords:

Clustering

Abstract

In this work we focus on designing a fast algorithm for lambda-density level set estimation via DBSCAN clustering. Previous work (Jiang ICML’17, and Jang and Jiang ICML’19) shows that under some natural assumptions DBSCAN and its variant DBSCAN++ can be used to estimate the lambda-density level set with near-optimal Hausdorff distance, i.e., with rate O~(n^{-1/(2 * beta+D)}). However, to achieve this near-optimal rate, the current fastest DBSCAN algorithm needs near quadratic running time. This running time is not very practical for giant datasets. Usually when we are working with very large datasets we desire linear or almost linear time algorithms. With this motivation, in this work, we present a modified DBSCAN algorithm with near optimal Hausdorff distance for density level set estimation with O~(n) running time. In our empirical study, we show that our algorithm provides significant speedup over the previous algorithms, while achieving comparable solution quality.

Downloads

Published

2021-05-18

How to Cite

Esfandiari, H., Mirrokni, V., & Zhong, P. (2021). Almost Linear Time Density Level Set Estimation via DBSCAN. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 7349-7357. https://doi.org/10.1609/aaai.v35i8.16902

Issue

Section

AAAI Technical Track on Machine Learning I