Metelli, Alberto Maria, Matteo Papini, Pierluca D’Oro, and Marcello Restelli. “Policy Optimization As Online Learning With Mediator Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 8958–8966. Accessed May 26, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/17083.