Bootstrapping Simulation-Based Algorithms with a Suboptimal Policy

Authors

  • Truong-Huy Nguyen Northeastern University
  • Tomi Silander Xerox Research Centre Europe
  • Wee-Sun Lee National University of Singapore
  • Tze-Yun Leong National University of Singapore

DOI:

https://doi.org/10.1609/icaps.v24i1.13644

Keywords:

markov decision process, sparse sampling, forward sparse sampling, uct, heuristic

Abstract

Finding optimal policies for Markov Decision Processes with large state spaces is in general intractable. Nonetheless, simulation-based algorithms inspired by Sparse Sampling (SS) such as Upper Confidence Bound applied in Trees (UCT) and Forward Search Sparse Sampling (FSSS) have been shown to perform reasonably well in both theory and practice, despite the high computational demand. To improve the efficiency of these algorithms, we adopt a simple enhancement technique with a heuristic policy to speed up the selection of optimal actions. The general method, called Aux, augments the look-ahead tree with auxiliary arms that are evaluated by the heuristic policy. In this paper, we provide theoretical justification for the method and demonstrate its effectiveness in two experimental benchmarks that showcase the faster convergence to a near optimal policy for both SS and FSSS. Moreover, to further speed up the convergence of these algorithms at the early stage, we present a novel mechanism to combine them with UCT so that the resulting hybrid algorithm is superior to both of its components.

Downloads

Published

2014-05-11

How to Cite

Nguyen, T.-H., Silander, T., Lee, W.-S., & Leong, T.-Y. (2014). Bootstrapping Simulation-Based Algorithms with a Suboptimal Policy. Proceedings of the International Conference on Automated Planning and Scheduling, 24(1), 181-189. https://doi.org/10.1609/icaps.v24i1.13644