[1]
A. Weinstein and M. Littman, “Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes”, ICAPS, vol. 22, no. 1, pp. 306-314, May 2012.