Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes

Ari Weinstein; Michael Littman

doi:10.1609/icaps.v22i1.13507

Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes

Authors

Ari Weinstein Rutgers University
Michael Littman Rutgers University

DOI:

https://doi.org/10.1609/icaps.v22i1.13507

Keywords:

Monte-Carlo tree search, Continuous action planning, stochastic optimization

Abstract

Recent research leverages results from the continuous-armed bandit literature to create a reinforcement-learning algorithm for continuous state and action spaces. Initially proposed in a theoretical setting, we provide the first examination of the empirical properties of the algorithm. Through experimentation, we demonstrate the effectiveness of this planning method when coupled with exploration and model learning and show that, in addition to its formal guarantees, the approach is very competitive with other continuous-action reinforcement learners.

Downloads

Published

2012-05-14

How to Cite

Weinstein, A., & Littman, M. (2012). Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes. Proceedings of the International Conference on Automated Planning and Scheduling, 22(1), 306-314. https://doi.org/10.1609/icaps.v22i1.13507

Download Citation

Issue

Vol. 22 (2012): Twenty-Second International Conference on Automated Planning and Scheduling

Section

Full Technical Papers

Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information