Yu H, Yang S, Zhu S. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. AAAI [Internet]. 2019 Jul. 17 [cited 2026 Jul. 19];33(01):5693-700. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/4514