Stability-Based Generalization Analysis of the Asynchronous Decentralized SGD
DOI:
https://doi.org/10.1609/aaai.v37i6.25894Keywords:
ML: Learning Theory, ML: Deep Learning Theory, ML: Distributed Machine Learning & Federated Learning, ML: OptimizationAbstract
The generalization ability often determines the success of machine learning algorithms in practice. Therefore, it is of great theoretical and practical importance to understand and bound the generalization error of machine learning algorithms. In this paper, we provide the first generalization results of the popular stochastic gradient descent (SGD) algorithm in the distributed asynchronous decentralized setting. Our analysis is based on the uniform stability tool, where stable means that the learned model does not change much in small variations of the training set. Under some mild assumptions, we perform a comprehensive generalizability analysis of the asynchronous decentralized SGD, including generalization error and excess generalization error bounds for the strongly convex, convex, and non-convex cases. Our theoretical results reveal the effects of the learning rate, training data size, training iterations, decentralized communication topology, and asynchronous delay on the generalization performance of the asynchronous decentralized SGD. We also study the optimization error regarding the objective function values and investigate how the initial point affects the excess generalization error. Finally, we conduct extensive experiments on MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets to validate the theoretical findings.Downloads
Published
2023-06-26
How to Cite
Deng, X., Sun, T., Li, S., & Li, D. (2023). Stability-Based Generalization Analysis of the Asynchronous Decentralized SGD. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7340-7348. https://doi.org/10.1609/aaai.v37i6.25894
Issue
Section
AAAI Technical Track on Machine Learning I