Yang, L., Zhang, Y., Zheng, G., Zheng, Q., Li, P., Huang, J., & Pan, G. (2022). Policy Optimization with Stochastic Mirror Descent. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), 8823-8831. https://doi.org/10.1609/aaai.v36i8.20863