Weinstein, Ari, and Michael Littman. “Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes”. Proceedings of the International Conference on Automated Planning and Scheduling 22, no. 1 (May 14, 2012): 306-314. Accessed November 22, 2024. https://ojs.aaai.org/index.php/ICAPS/article/view/13507.