Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Wei Tao; Wei Li; Zhisong Pan; Qing Tao

doi:10.1609/aaai.v35i11.17183

Authors

Wei Tao Academy of Military Science Army Engineering University
Wei Li Army Engineering University
Zhisong Pan Army Engineering University
Qing Tao Army Academy of Artillery and Air Defense Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v35i11.17183

Keywords:

Optimization

Abstract

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription