Investigating Social Bias Propagation in Federated Fine-tuning of Large Language Models

Authors

  • Jiaxu Zhao Eindhoven University of Technology EPFL
  • Meng Fang University of Liverpool Eindhoven University of Technology
  • Mingze Zhong University of Technology Sydney
  • Shunfeng Zheng University of Technology Sydney
  • Ling Chen University of Technology Sydney
  • Mykola Pechenizkiy Eindhoven University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i46.41316

Abstract

Large language models (LLMs) have achieved remarkable success in many domains, but concerns about data quality and privacy are growing. Federated Learning (FL) offers a privacy-preserving solution by training a model on local clients without sharing data. However, the impact of biased private data on LLMs fine-tuned through FL remains understudied. This work investigates how client-side biased data affects the global model during federated fine-tuning of LLMs. We simulate realistic scenarios where some clients possess datasets containing social biases (stereotypes, discriminatory language) while others have clean data through extensive experiments with popular FL algorithms (FedAvg, FedAdam and FedProx) and popular LLMs (LLaMA, Mistral, Phi-3 and Gemma) across datasets with varying bias proportions (33%, 66%, 100%). Our findings reveal that 1) FedAdam consistently shows the lowest bias propagation, reducing CrowS-Pairs scores by up to 15% compared to FedAvg; 2) Even small amounts of biased data (33%) can significantly influence global model bias; 3) Mixed biased and neutral data distributions lead to 5%-7% higher bias scores than segregated distributions. Additionally, we propose Bias-Aware Model Aggregation (BAMA), a novel debiasing method for federated fine-tuning that consistently reduces bias across various models and algorithms.

Downloads

Published

2026-03-14

How to Cite

Zhao, J., Fang, M., Zhong, M., Zheng, S., Chen, L., & Pechenizkiy, M. (2026). Investigating Social Bias Propagation in Federated Fine-tuning of Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39637–39645. https://doi.org/10.1609/aaai.v40i46.41316