FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

Authors

  • Junkang Liu Tianjin University
  • Fanhua Shang Tianjin University
  • Hongying Liu Tianjin University
  • Yuxuan Tian Institute of automation, Chinese academy of science, Chinese Academy of Sciences
  • Yuanyuan Liu Xidian University
  • Jin Liu Xi'an University of Electronic Science and Technology
  • Kewen Zhu Tianjin University
  • Zhouchen Lin Peking University, Pazhou Laboratory (Huangpu)

DOI:

https://doi.org/10.1609/aaai.v40i28.39549

Abstract

AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly applying AdamW in federated learning settings poses significant challenges: (1) due to data heterogeneity, AdamW often yields high variance in the second-moment estimate v; (2) the local overfitting of AdamW may cause client drift; and (3) Reinitializing moment estimates (v, m) at each round slows down convergence. To address these challenges, we propose the first Federated AdamW algorithm, called FedAdamW, for training and fine-tuning various large models. FedAdamW aligns local updates with the global update using both a local correction mechanism and decoupled weight decay to mitigate local overfitting. FedAdamW efficiently aggregates the mean of the second-moment estimates to reduce their variance and reinitialize them. Theoretically, we prove that FedAdamW achieves a linear speedup convergence rate of O(p(L∆σ2l )/(SKRε2) + (L∆)/R) without heterogeneity assumption, where S is the number of participating clients per round, K is the number of local iterations, and R is the total number of communication rounds. We also employ PAC-Bayesian generalization analysis to explain the effectiveness of decoupled weight decay in local training. Empirically, we validate the effectiveness of FedAdamW on language and vision Transformer models. Compared to several baselines, FedAdamW significantly reduces communication rounds and improves test accuracy.

Downloads

Published

2026-03-14

How to Cite

Liu, J., Shang, F., Liu, H., Tian, Y., Liu, Y., Liu, J., … Lin, Z. (2026). FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23748–23756. https://doi.org/10.1609/aaai.v40i28.39549

Issue

Section

AAAI Technical Track on Machine Learning V