Navigating Towards Fairness with Data Selection

Authors

  • Yixuan Zhang Southeast University
  • Zhidong Li University of Technology Sydney
  • Yang Wang University of Technology Sydney
  • Fang Chen University of Technology Sydney
  • Xuhui Fan Macquarie University
  • Feng Zhou Renmin University of China Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing

DOI:

https://doi.org/10.1609/aaai.v39i21.34422

Abstract

Machine learning algorithms often struggle to eliminate inherent data biases, particularly those arising from unreliable labels, which poses a significant challenge in ensuring fairness. Existing fairness techniques that address label bias typically involve modifying models and intervening in the training process, but these lack flexibility for large-scale datasets. To address this limitation, we introduce a data selection method designed to efficiently and flexibly mitigate label bias, tailored to more practical needs. Our approach utilizes a zero-shot predictor as a proxy model that simulates training on a clean holdout set. This strategy, supported by peer predictions, ensures the fairness of the proxy model and eliminates the need for an additional holdout set, which is a common requirement in previous methods. Without altering the classifier's architecture, our modality-agnostic method effectively selects appropriate training data and has proven efficient and effective in handling label bias and improving fairness across diverse datasets in experimental evaluations.

Downloads

Published

2025-04-11

How to Cite

Zhang, Y., Li, Z., Wang, Y., Chen, F., Fan, X., & Zhou, F. (2025). Navigating Towards Fairness with Data Selection. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22632–22640. https://doi.org/10.1609/aaai.v39i21.34422

Issue

Section

AAAI Technical Track on Machine Learning VII