(1)
Gan, Y.; Zhang, Z.; Tan, X. Stabilizing Q Learning Via Soft Mellowmax Operator. AAAI 2021, 35, 7501-7509.