Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

Authors

  • Hossein Taheri University of California, Santa Barbara
  • Christos Thrampoulidis University of British Columbia

DOI:

https://doi.org/10.1609/aaai.v37i8.26186

Keywords:

ML: Optimization, ML: Classification and Regression, ML: Learning Theory

Abstract

Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds.

Downloads

Published

2023-06-26

How to Cite

Taheri, H., & Thrampoulidis, C. (2023). Fast Convergence in Learning Two-Layer Neural Networks with Separable Data. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9944-9952. https://doi.org/10.1609/aaai.v37i8.26186

Issue

Section

AAAI Technical Track on Machine Learning III