[1]

Y. Gan, Z. Zhang, and X. Tan, “Stabilizing Q Learning Via Soft Mellowmax Operator”, AAAI, vol. 35, no. 9, pp. 7501-7509, May 2021.