A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Authors

  • Kaixin Gao Tianjin University
  • Xiaolei Liu Tianjin University
  • Zhenghai Huang Tianjin University
  • Min Wang Huawei Technologies Co. Ltd
  • Zidong Wang Huawei Technologies Co. Ltd
  • Dachuan Xu Beijing University of Technology
  • Fan Yu Huawei Technologies Co. Ltd

Keywords:

(Deep) Neural Network Algorithms, Optimization

Abstract

Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. In this work, inspired by diagonal approximations and factored approximations such as Kronecker-factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC), which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on convolutional neural networks to maintain the superiority of second-order optimization methods during training. Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures.

Downloads

Published

2021-05-18

How to Cite

Gao, K., Liu, X., Huang, Z., Wang, M., Wang, Z., Xu, D., & Yu, F. (2021). A Trace-restricted Kronecker-Factored Approximation to Natural Gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7519-7527. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16921

Issue

Section

AAAI Technical Track on Machine Learning II