Balancing Exploration and Exploitation in Classical Planning
Keywords:uct, monte-carlo, planning, exploration, exploitation
Successful heuristic search planners for satisficing planning like FF or LAMA are usually based on one or more best first search techniques. Recent research has led to planners like Arvand, Roamer or Probe, where novel techniques like Monte-Carlo Random Walks extend the traditional exploitation-focused best first search by an exploration component. The UCT algorithm balances these contradictory incentives and has shown tremendous success in related areas of sequential decision making but has never been applied to classical planning yet. We make up for this shortcoming by applying the Trial-based Heuristic Tree Search framework to classical planning. We show how to model the best first search techniques Weighted A* and Greedy Best First Search with only three ingredients: action selection, initialization and backup function. Then we use THTS to derive four versions of the UCT algorithm that differ in the used backup functions. The experimental evaluation shows that our main algorithm, GreedyUCT*, outperforms all other algorithms presented in this paper, both in terms of coverage and quality.