Yu, H., S. Yang, and S. Zhu. “Parallel Restarted SGD With Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, July 2019, pp. 5693-00, doi:10.1609/aaai.v33i01.33015693.