[1]

Metelli, A.M., Papini, M., D’Oro, P. and Restelli, M. 2021. Policy Optimization as Online Learning with Mediator Feedback. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 10 (May 2021), 8958-8966. DOI:https://doi.org/10.1609/aaai.v35i10.17083.