Wu, X., Xie, Y., Du, S. S., & Ward, R. (2022). AdaLoss: A Computationally-Efficient and Provably Convergent Adaptive Gradient Method. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), 8691–8699. https://doi.org/10.1609/aaai.v36i8.20848