Understanding Stochastic Optimization Behavior at the Layer Update Level (Student Abstract)

Jack Zhang; Guan Xiong Qiao; Alexandru Lopotenco; Ian Tong Pan

doi:10.1609/aaai.v36i11.21691

Understanding Stochastic Optimization Behavior at the Layer Update Level (Student Abstract)

Authors

Jack Zhang 512 Technologies
Guan Xiong Qiao 512 Technologies
Alexandru Lopotenco 512 Technologies
Ian Tong Pan 512 Technologies

DOI:

https://doi.org/10.1609/aaai.v36i11.21691

Keywords:

Optimization, Stochastic Optimization, Gradient, Parameters, Deep Learning, Deep Neural Networks, Learning Methods

Abstract

Popular first-order stochastic optimization methods for deep neural networks (DNNs) are usually either accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum) or adaptive step-size methods (e.g. Adam/AdaMax, AdaBelief). In many contexts, including image classification with DNNs, adaptive methods tend to generalize poorly compared to SGD, i.e. get stuck in non-robust local minima; however, SGD typically converges slower. We analyze possible reasons for this behavior by modeling gradient updates as vectors of random variables and comparing them to probabilistic bounds to identify "meaningful" updates. Through experiments, we observe that only layers close to the output have "definitely non-random" update behavior. In the future, the tools developed here may be useful in rigorously quantifying and analyzing intuitions about why some optimizers and particular DNN architectures perform better than others.

Downloads

Published

2022-06-28

How to Cite

Zhang, J., Qiao, G. X., Lopotenco, A., & Pan, I. T. (2022). Understanding Stochastic Optimization Behavior at the Layer Update Level (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 13109-13110. https://doi.org/10.1609/aaai.v36i11.21691

Download Citation

Issue

Vol. 36 No. 11: IAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations

Section

AAAI Student Abstract and Poster Program

Understanding Stochastic Optimization Behavior at the Layer Update Level (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription