Yang, Long, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, and Gang Pan. “Policy Optimization With Stochastic Mirror Descent”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8823–8831. Accessed July 21, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/20863.