Yang, L., Y. Zhang, G. Zheng, Q. Zheng, P. Li, J. Huang, and G. Pan. “Policy Optimization With Stochastic Mirror Descent”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, June 2022, pp. 8823-31, doi:10.1609/aaai.v36i8.20863.