Wu, Xiaoxia, Yuege Xie, Simon Shaolei Du, and Rachel Ward. “AdaLoss: A Computationally-Efficient and Provably Convergent Adaptive Gradient Method”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8691–8699. Accessed May 25, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/20848.