[1]

H. Yu, S. Yang, and S. Zhu, “Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning”, AAAI, vol. 33, no. 01, pp. 5693-5700, Jul. 2019.