(1)

Low, S. M.; Kumar, A.; Sanner, S. Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs. AAAI 2022, 36, 9840-9848.