(1)
Metelli, A. M.; Papini, M.; D’Oro, P.; Restelli, M. Policy Optimization As Online Learning With Mediator Feedback. AAAI 2021, 35, 8958-8966.