On the Convergence of Communication-Efficient Local SGD for Federated Learning
DOI:
https://doi.org/10.1609/aaai.v35i9.16920Keywords:
Distributed Machine Learning & Federated LearningAbstract
Federated Learning (FL) has attracted increasing attention in recent years. A leading training algorithm in FL is local SGD, which updates the model parameter on each worker and averages model parameters across different workers only once in a while. Although it has fewer communication rounds than the classical parallel SGD, local SGD still has large communication overhead in each communication round for large machine learning models, such as deep neural networks. To address this issue, we propose a new communication-efficient distributed SGD method, which can significantly reduce the communication cost by the error-compensated double compression mechanism. Under the non-convex setting, our theoretical results show that our approach has better communication complexity than existing methods and enjoys the same linear speedup regarding the number of workers as the full-precision local SGD. Moreover, we propose a communication-efficient distributed SGD with momentum, which also has better communication complexity than existing methods and enjoys a linear speedup with respect to the number of workers. At last, extensive experiments are conducted to verify the performance of our proposed two methods. Moreover, we propose a communication-efficient distributed SGD with momentum to accelerate the convergence, which also has better communication complexity than existing methods and enjoys a linear speedup with respect to the number of workers. At last, extensive experiments are conducted to verify the performance of our proposed methods.Downloads
Published
2021-05-18
How to Cite
Gao, H., Xu, A., & Huang, H. (2021). On the Convergence of Communication-Efficient Local SGD for Federated Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7510-7518. https://doi.org/10.1609/aaai.v35i9.16920
Issue
Section
AAAI Technical Track on Machine Learning II