Gan, Y., Zhang, Z., & Tan, X. (2021). Stabilizing Q Learning Via Soft Mellowmax Operator. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7501-7509. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16919