1.
Metelli AM, Papini M, D’Oro P, Restelli M. Policy Optimization as Online Learning with Mediator Feedback. AAAI [Internet]. 2021 May 18 [cited 2026 May 26];35(10):8958-66. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/17083