Fully-Scalable Massively Parallel Algorithm for k-center with Outliers

Authors

  • Di Wu School of Computer Science and Engineering, Central South University, Changsha 410083, China Xiangjiang Laboratory, Changsha 410205, China
  • Qilong Feng School of Computer Science and Engineering, Central South University, Changsha 410083, China Xiangjiang Laboratory, Changsha 410205, China
  • Junyu Huang School of Computer Science and Engineering, Central South University, Changsha 410083, China Xiangjiang Laboratory, Changsha 410205, China
  • Jinhui Xu Department of Computer Science and Engineering, State University of New York at Buffalo, NY, USA
  • Ziyun Huang Department of Computer Science and Software Engineering, Penn State Erie, The Behrend College
  • Jianxin Wang School of Computer Science and Engineering, Central South University, Changsha 410083, China Xiangjiang Laboratory, Changsha 410205, China The Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha 410083, China

DOI:

https://doi.org/10.1609/aaai.v39i20.35454

Abstract

In this paper, we consider the k-center problem with outliers (the (k, z)-center problem) in the context of Massively Parallel Computation (MPC). Existing MPC algorithms for the (k, z)-center problem typically require Ω(k) local space per machine. While this may be feasible when k is small, these algorithms become impractical for large k, where each machine may lack sufficient space for computation. This motivates the study of fully-scalable algorithms with sublinear local space. We propose the first fully-scalable MPC algorithm for the (k, z)-center problem. The main challenge is to design an MPC algorithm that operates with sublinear local space for finding the inliers close to the optimal clustering centers, and ensuring the approximation loss remains bounded. To address this issue, we propose an iterative sampling-based algorithm with sublinear local space in the data size. A key component of our approach is an outliers-removal algorithm that adjusts the sample size in each iteration to select inliers as clustering centers. However, the number of discarded inliers increases with the iteration of the outliers-removal algorithm, making it difficult to bound. To address this, we propose a self-adaptive method that can automatically adjust sample size to account for different data distributions on each machine, ensuring a lower bound on the sampling success probability. With these techniques, we present an O(log^*n)-approximation MPC algorithm for the (k, z)-center problem in constant-dimensional Euclidean space. The algorithm discards at most (1 + ε)z outliers, completing in O(log log n) computation rounds while using Θ(n^δ) local space per machine.

Published

2025-04-11

How to Cite

Wu, D., Feng, Q., Huang, J., Xu, J., Huang, Z., & Wang, J. (2025). Fully-Scalable Massively Parallel Algorithm for k-center with Outliers. Proceedings of the AAAI Conference on Artificial Intelligence, 39(20), 21519–21527. https://doi.org/10.1609/aaai.v39i20.35454

Issue

Section

AAAI Technical Track on Machine Learning VI