Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

Authors

  • Diptarka Chakraborty National University of Singapore
  • Kushagra Chatterjee National University of Singapore
  • Debarati Das Pennsylvania State University
  • Tien-Long Nguyen Pennsylvania State University

DOI:

https://doi.org/10.1609/aaai.v40i24.39077

Abstract

Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. A recent work initiated the study of closest fair clustering, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the closest fair clustering problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the fair correlation clustering problem, advancing the state-of-the-art results. Additionally, we are the first to provide approximation algorithms for the fair consensus clustering problem involving multiple (more than two) groups.

Published

2026-03-14

How to Cite

Chakraborty, D., Chatterjee, K., Das, D., & Nguyen, T.-L. (2026). Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 19934-19942. https://doi.org/10.1609/aaai.v40i24.39077

Issue

Section

AAAI Technical Track on Machine Learning I