Gan, Y., Zhang, Z., & Tan, X. (2021). Stabilizing Q Learning Via Soft Mellowmax Operator. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7501-7509. https://doi.org/10.1609/aaai.v35i9.16919