[1]

Gan, Y., Zhang, Z. and Tan, X. 2021. Stabilizing Q Learning Via Soft Mellowmax Operator. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 9 (May 2021), 7501-7509. DOI:https://doi.org/10.1609/aaai.v35i9.16919.