Yang, L. (2022) “Policy Optimization with Stochastic Mirror Descent”, Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), pp. 8823–8831. doi: 10.1609/aaai.v36i8.20863.