Yu, Hao, Sen Yang, and Shenghuo Zhu. 2019. “Parallel Restarted SGD With Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning”. Proceedings of the AAAI Conference on Artificial Intelligence 33 (01):5693-5700. https://doi.org/10.1609/aaai.v33i01.33015693.