Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition Systems
Abstract
Facial Recognition Systems (FRSs) are being developed and deployed all around the world at unprecedented rates. Most platforms are designed in a limited set of countries, but deployed in other regions too, without adequate checkpoints for region-specific requirements. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of how FRSs function and low-resource bias mitigation measures accentuate the problems at hand. In this work, we propose a self-curated face dataset composed of 6,579 unique male and female sports-persons (cricket players) from eight countries around the world. More than 50% of the dataset is composed of individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, we curate four adversarial variants of each image in the dataset, leading to more than 40,000 distinct images. We also use this dataset to benchmark five popular facial recognition systems (FRSs), including both commercial and open-source FRSs, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2% (in case of Azure) to 38.1% (in case of Face++), with a large disparity between males and females in the Global South (max difference of 38.5% in case of Face++). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). A Grad-CAM analysis shows that the nose, forehead and mouth are the regions of interest for one of the open-source FRSs. Based on this crucial observation, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques that demonstrate a significant improvement in accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. For the red-teaming experiment using the open-source Deepface model we observe that simple fine-tuning is not very useful while contrastive learning brings steady benefits.Downloads
Published
2024-10-16
Issue
Section
Full Archival Papers