Weinstein, Ari, and Michael Littman. 2012. “Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes”. Proceedings of the International Conference on Automated Planning and Scheduling 22 (1):306-14. https://doi.org/10.1609/icaps.v22i1.13507.