Fair Bayesian Data Selection via Generalized Discrepancy Measures

Authors

  • Yixuan Zhang Southeast University
  • Jiabin Luo Peking University
  • Zhenggang Wang Southeast University
  • Feng Zhou Renmin University of China Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
  • Quyu Kong Alibaba Cloud

DOI:

https://doi.org/10.1609/aaai.v40i34.40078

Abstract

Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level, they often suffer from high computational costs, limited scalability, and poor generalization. To address these challenges, we propose a Bayesian data selection framework that ensures fairness by aligning group-specific posterior distributions of model parameters and sample weights with a shared central distribution. Our framework supports flexible alignment via various distributional discrepancy measures, including Wasserstein distance, maximum mean discrepancy, and f-divergence, allowing geometry-aware control without imposing explicit fairness constraints. This data-centric approach mitigates group-specific biases in training data and improves fairness in downstream tasks, with theoretical guarantees. Experiments on benchmark datasets show that our method consistently outperforms existing data selection and model-based fairness methods in both fairness and accuracy.

Downloads

Published

2026-03-14

How to Cite

Zhang, Y., Luo, J., Wang, Z., Zhou, F., & Kong, Q. (2026). Fair Bayesian Data Selection via Generalized Discrepancy Measures. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28483–28491. https://doi.org/10.1609/aaai.v40i34.40078

Issue

Section

AAAI Technical Track on Machine Learning XI