[1]

A. M. Metelli, M. Papini, P. D’Oro, and M. Restelli, “Policy Optimization as Online Learning with Mediator Feedback”, AAAI, vol. 35, no. 10, pp. 8958–8966, May 2021.