Surviving in Diverse Biases: Unbiased Dataset Acquisition in Online Data Market for Fair Model Training

Jiashi Gao; Ziwei Wang; Xiangyu Zhao; Xin Yao; Xuetao Wei

doi:10.1609/aies.v7i1.31649

Authors

Jiashi Gao Southern University of Science and Technology (SUSTech), Shenzhen, China
Ziwei Wang Southern University of Science and Technology (SUSTech), Shenzhen, China University of Birmingham (UoB), UK
Xiangyu Zhao City University of Hong Kong (CityU), Hong Kong SAR, China
Xin Yao Lingnan University (LU), Hong Kong SAR, China
Xuetao Wei Southern University of Science and Technology (SUSTech), Shenzhen, China

DOI:

https://doi.org/10.1609/aies.v7i1.31649

Abstract

The online data markets have emerged as a valuable source of diverse datasets for training machine learning (ML) models. However, datasets from different data providers may exhibit varying levels of bias with respect to certain sensitive attributes in the population (such as race, sex, age, and marital status). Recent dataset acquisition research has focused on maximizing accuracy improvements for downstream model training, ignoring the negative impact of biases in the acquired datasets, which can lead to an unfair model. Can a consumer obtain an unbiased dataset from datasets with diverse biases? In this work, we propose a fairness-aware data acquisition framework (FAIRDA) to acquire high-quality datasets that maximize both accuracy and fairness for consumer local classifier training while remaining within a limited budget. Given the biases of data commodities remain opaque to consumers, the data acquisition in FAIRDA employs explore-exploit strategies. Based on whether exploration and exploitation are conducted sequentially or alternately, we introduce two algorithms: the knowledge-based offline data acquisition (KDA) and the reward-based online data acquisition algorithms (RDA). Each algorithm is tailored to specific customer needs, giving the former an advantage in computational efficiency and the latter an advantage in robustness. We conduct experiments to demonstrate the effectiveness of the proposed data acquisition framework in steering users toward fairer model training compared to existing baselines under varying market settings.

Surviving in Diverse Biases: Unbiased Dataset Acquisition in Online Data Market for Fair Model Training

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section