A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Keywords:(Deep) Neural Network Algorithms, Optimization
AbstractSecond-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. In this work, inspired by diagonal approximations and factored approximations such as Kronecker-factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC), which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on convolutional neural networks to maintain the superiority of second-order optimization methods during training. Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures.
How to Cite
Gao, K., Liu, X., Huang, Z., Wang, M., Wang, Z., Xu, D., & Yu, F. (2021). A Trace-restricted Kronecker-Factored Approximation to Natural Gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7519-7527. https://doi.org/10.1609/aaai.v35i9.16921
AAAI Technical Track on Machine Learning II