The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models

Authors

  • Li Wang Northeast Normal University
  • Zhiguo Fu Northeast Normal University
  • Yingcong Zhou Northeast Normal University
  • Zili Yan Beihua University

DOI:

https://doi.org/10.1609/aaai.v37i8.26209

Keywords:

ML: Deep Learning Theory, ML: Classification and Regression, ML: Deep Neural Network Algorithms, ML: Optimization, ML: Transparent, Interpretable, Explainable ML

Abstract

The study of the implicit regularization induced by gradient-based optimization in deep learning is a long-standing pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) in the continuous-time view, so-called momentum gradient flow (MGF). We show that the components of weight vector are learned for a deep linear neural networks at different evolution rates, and this evolution gap increases with the depth. Firstly, we show that if the depth equals one, the evolution gap between the weight vector components is linear, which is consistent with the performance of ridge. In particular, we establish a tight coupling between MGF and ridge for the least squares regression. In detail, we show that when the regularization parameter of ridge is inversely proportional to the square of the time parameter of MGF, the risk of MGF is no more than 1.54 times that of ridge, and their relative Bayesian risks are almost indistinguishable. Secondly, if the model becomes deeper, i.e. the depth is greater than or equal to 2, the evolution gap becomes more significant, which implies an implicit bias towards sparse solutions. The numerical experiments strongly support our theoretical results.

Downloads

Published

2023-06-26

How to Cite

Wang, L., Fu, Z., Zhou, Y., & Yan, Z. (2023). The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 10149-10156. https://doi.org/10.1609/aaai.v37i8.26209

Issue

Section

AAAI Technical Track on Machine Learning III