Wu, Xiaoxia, et al. “AdaLoss: A Computationally-Efficient and Provably Convergent Adaptive Gradient Method”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, June 2022, pp. 8691-9, doi:10.1609/aaai.v36i8.20848.