Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent

Authors

  • Jun Chen College of Informatics, Huazhong Agricultural University, Wuhan, China
  • Hong Chen College of Informatics, Huazhong Agricultural University, Wuhan, China Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, China Key Laboratory of Smart Farming for Agricultural Animals, Wuhan, China
  • Bin Gu School of Artificial Intelligence, Jilin University, Jilin, China
  • Guodong Liu University of Pittsburgh
  • Yingjie Wang College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, China
  • Weifu Li College of Informatics, Huazhong Agricultural University, Wuhan, China Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, China Key Laboratory of Smart Farming for Agricultural Animals, Wuhan, China

DOI:

https://doi.org/10.1609/aaai.v39i15.33735

Abstract

In recent years, there have been a growing number of works studying the generalization properties of stochastic gradient descent (SGD) from the perspective of algorithmic stability. However, few of them devote to simultaneously studying the generalization and optimization for the non-convex setting, especially pairwise SGD with heavy-tailed gradient noise. This paper considers the impact of the heavy-tailed gradient noise obeying sub-Weibull distribution on the stability-based learning guarantees for non-convex pairwise SGD by investigating its generalization and optimization jointly. Specifically, based on two novel pairwise uniform model stability tools, we firstly bound the generalization error of pairwise SGD in the general non-convex setting after bridging the quantitative relationships between stability and generalization error. Then, we further consider the practical heavy-tailed sub-Weibull gradient noise condition to establish a refined generalization bound without the bounded gradient condition. Finally, sharper error bounds for generalization and optimization are built by introducing the gradient dominance condition. Comparing these results reveals that sub-Weibull gradient noise brings some positive dependencies on the heavy-tailed strength for generalization and optimization. Furthermore, we extend our analysis to the corresponding pairwise minibatch SGD and derive the first stability-based near-optimal generalization and optimization bounds which are consistent with many empirical observations.

Downloads

Published

2025-04-11

How to Cite

Chen, J., Chen, H., Gu, B., Liu, G., Wang, Y., & Li, W. (2025). Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent. Proceedings of the AAAI Conference on Artificial Intelligence, 39(15), 15803–15811. https://doi.org/10.1609/aaai.v39i15.33735

Issue

Section

AAAI Technical Track on Machine Learning I