Metelli, A. M., Papini, M., D’Oro, P., & Restelli, M. (2021). Policy Optimization as Online Learning with Mediator Feedback. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8958-8966. https://doi.org/10.1609/aaai.v35i10.17083