Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent

Jun Chen; Hong Chen; Bin Gu; Guodong Liu; Yingjie Wang; Weifu Li

doi:10.1609/aaai.v39i15.33735

Authors

Jun Chen College of Informatics, Huazhong Agricultural University, Wuhan, China
Hong Chen College of Informatics, Huazhong Agricultural University, Wuhan, China Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, China Key Laboratory of Smart Farming for Agricultural Animals, Wuhan, China
Bin Gu School of Artificial Intelligence, Jilin University, Jilin, China
Guodong Liu University of Pittsburgh
Yingjie Wang College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, China
Weifu Li College of Informatics, Huazhong Agricultural University, Wuhan, China Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, China Key Laboratory of Smart Farming for Agricultural Animals, Wuhan, China

DOI:

https://doi.org/10.1609/aaai.v39i15.33735

Abstract

In recent years, there have been a growing number of works studying the generalization properties of stochastic gradient descent (SGD) from the perspective of algorithmic stability. However, few of them devote to simultaneously studying the generalization and optimization for the non-convex setting, especially pairwise SGD with heavy-tailed gradient noise. This paper considers the impact of the heavy-tailed gradient noise obeying sub-Weibull distribution on the stability-based learning guarantees for non-convex pairwise SGD by investigating its generalization and optimization jointly. Specifically, based on two novel pairwise uniform model stability tools, we firstly bound the generalization error of pairwise SGD in the general non-convex setting after bridging the quantitative relationships between stability and generalization error. Then, we further consider the practical heavy-tailed sub-Weibull gradient noise condition to establish a refined generalization bound without the bounded gradient condition. Finally, sharper error bounds for generalization and optimization are built by introducing the gradient dominance condition. Comparing these results reveals that sub-Weibull gradient noise brings some positive dependencies on the heavy-tailed strength for generalization and optimization. Furthermore, we extend our analysis to the corresponding pairwise minibatch SGD and derive the first stability-based near-optimal generalization and optimization bounds which are consistent with many empirical observations.

Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information