1.
Shani L, Efroni Y, Mannor S. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs. AAAI [Internet]. 2020Apr.3 [cited 2024Nov.26];34(04):5668-75. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/6021