Exploiting Learned Policies and Learned Heuristics in Bounded-Suboptimal Search


  • Matias Greco Departamento de Ciencia de la Computación, Pontificia Universidad Católica de Chile


Machine And Deep Learning In Search, Combinatorial Puzzles, Problem Solving Using Search, Reinforcement Learning And Search


Machine Learning (ML) has made significant progress to perform different tasks, such as image classification, speech recognition, and natural language processing, mainly driven by deep learning. Also, ML algorithms, through learning policies or heuristics estimates, have demonstrated potential for solving deterministic problems that would usually be solved using search techniques. Nevertheless, in solving a search problem with purely learning techniques, it is not possible to deliver guarantees regarding the quality of the solution. This research explores how a learned policy or heuristic can be integrated with a bounded-suboptimal search algorithm using Focal Search, sorting the FOCAL list using the concept of discrepancies to speed up the search. On the experimental side, we train a simple neural network as a learned policy and the DeepCubeA as a learned heuristic for the 15-puzzle domain. The results show that a learned policy or heuristic can reduce, at least, one order of magnitude, the expansions than WA* with the same bound and deliver better solution quality.