[1]

Gan, Y. et al. 2021. Stabilizing Q Learning Via Soft Mellowmax Operator. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 9 (May 2021), 7501–7509. DOI:https://doi.org/10.1609/aaai.v35i9.16919.