Low, Siow Meng, Akshat Kumar, and Scott Sanner. “Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 9840-9848. Accessed April 23, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/21220.