Yu, Hao, Sen Yang, and Shenghuo Zhu. “Parallel Restarted SGD With Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning”. Proceedings of the AAAI Conference on Artificial Intelligence 33, no. 01 (July 17, 2019): 5693-5700. Accessed March 29, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/4514.