Yang, Long, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, and Gang Pan. “Policy Optimization With Stochastic Mirror Descent”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8823-8831. Accessed May 25, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/20863.