[1]

S. M. Low, A. Kumar, and S. Sanner, “Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs”, AAAI, vol. 36, no. 9, pp. 9840-9848, Jun. 2022.