Investigating Social Bias Propagation in Federated Fine-tuning of Large Language Models
DOI:
https://doi.org/10.1609/aaai.v40i46.41316Abstract
Large language models (LLMs) have achieved remarkable success in many domains, but concerns about data quality and privacy are growing. Federated Learning (FL) offers a privacy-preserving solution by training a model on local clients without sharing data. However, the impact of biased private data on LLMs fine-tuned through FL remains understudied. This work investigates how client-side biased data affects the global model during federated fine-tuning of LLMs. We simulate realistic scenarios where some clients possess datasets containing social biases (stereotypes, discriminatory language) while others have clean data through extensive experiments with popular FL algorithms (FedAvg, FedAdam and FedProx) and popular LLMs (LLaMA, Mistral, Phi-3 and Gemma) across datasets with varying bias proportions (33%, 66%, 100%). Our findings reveal that 1) FedAdam consistently shows the lowest bias propagation, reducing CrowS-Pairs scores by up to 15% compared to FedAvg; 2) Even small amounts of biased data (33%) can significantly influence global model bias; 3) Mixed biased and neutral data distributions lead to 5%-7% higher bias scores than segregated distributions. Additionally, we propose Bias-Aware Model Aggregation (BAMA), a novel debiasing method for federated fine-tuning that consistently reduces bias across various models and algorithms.Downloads
Published
2026-03-14
How to Cite
Zhao, J., Fang, M., Zhong, M., Zheng, S., Chen, L., & Pechenizkiy, M. (2026). Investigating Social Bias Propagation in Federated Fine-tuning of Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39637–39645. https://doi.org/10.1609/aaai.v40i46.41316
Issue
Section
AAAI Special Track on AI for Social Impact II