Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent

Authors

  • Giorgi Nadiradze IST Austria
  • Ilia Markov IST Austria
  • Bapi Chatterjee IST Austria
  • Vyacheslav Kungurtsev Czech Technical University in Prague
  • Dan Alistarh IST Austria

Keywords:

Distributed Machine Learning & Federated Learning

Abstract

One key element behind the recent progress of machine learning has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Most of these models are trained employing variants of stochastic gradient descent (SGD) based optimization, but most methods involve some type of consistency relaxation relative to sequential SGD, to mitigate its large communication or synchronization costs at scale. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency, decouples the system-specific aspects of the implementation from the SGD convergence requirements, giving a general way to obtain convergence bounds for a wide variety of distributed SGD methods used in practice. Elastic consistency can be used to re-derive or improve several previous convergence bounds in message-passing and shared-memory settings, but also to analyze new models and distribution schemes. As a direct application, we propose and analyze a new synchronization-avoiding scheduling scheme for distributed SGD, and show that it can be used to efficiently train deep convolutional models for image classification.

Downloads

Published

2021-05-18

How to Cite

Nadiradze, G., Markov, I., Chatterjee, B., Kungurtsev, V., & Alistarh, D. (2021). Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 9037-9045. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17092

Issue

Section

AAAI Technical Track on Machine Learning III