[1]
L. Shani, Y. Efroni, and S. Mannor, “Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs”, AAAI, vol. 34, no. 04, pp. 5668-5675, Apr. 2020.