(1)
Yang, L.; Zhang, Y.; Zheng, G.; Zheng, Q.; Li, P.; Huang, J.; Pan, G. Policy Optimization With Stochastic Mirror Descent. AAAI 2022, 36, 8823-8831.