[1]

Yu, H., Yang, S. and Zhu, S. 2019. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 33, 01 (Jul. 2019), 5693-5700. DOI:https://doi.org/10.1609/aaai.v33i01.33015693.