[1]

Y. Efroni, N. Merlis, and S. Mannor, “Reinforcement Learning with Trajectory Feedback”, AAAI, vol. 35, no. 8, pp. 7288–7295, May 2021.