Yu, H., Yang, S. and Zhu, S. (2019) “Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning”, Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), pp. 5693–5700. doi: 10.1609/aaai.v33i01.33015693.