Yang, Long, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, and Gang Pan. 2022. “Policy Optimization With Stochastic Mirror Descent”. Proceedings of the AAAI Conference on Artificial Intelligence 36 (8):8823-31. https://doi.org/10.1609/aaai.v36i8.20863.