1.
Yu H, Yang S, Zhu S. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. AAAI [Internet]. 2019Jul.17 [cited 2024Jul.13];33(01):5693-700. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/4514