(1)
Weinstein, A.; Littman, M. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes. ICAPS 2012, 22, 306-314.