Low, S. M., Kumar, A., & Sanner, S. (2022). Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9), 9840-9848. https://doi.org/10.1609/aaai.v36i9.21220