Yu, H., Yang, S., & Zhu, S. (2019). Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 5693-5700. https://doi.org/10.1609/aaai.v33i01.33015693