Convex Hull Monte-Carlo Tree-Search

Authors

  • Michael Painter University of Oxford
  • Bruno Lacerda University of Oxford
  • Nick Hawes University of Oxford

Abstract

This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multi-objective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget.

Downloads

Published

2020-06-01

How to Cite

Painter, M., Lacerda, B., & Hawes, N. (2020). Convex Hull Monte-Carlo Tree-Search. Proceedings of the International Conference on Automated Planning and Scheduling, 30(1), 217-225. Retrieved from https://ojs.aaai.org/index.php/ICAPS/article/view/6664