Low, Siow Meng, Akshat Kumar, and Scott Sanner. 2022. “Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs”. Proceedings of the AAAI Conference on Artificial Intelligence 36 (9):9840-48. https://doi.org/10.1609/aaai.v36i9.21220.