Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search
Keywords:Planning with Markov Models (MDPs, POMDPs), Sequential Decision Making, Reinforcement Learning, Neuro-Symbolic AI (NSAI)
AbstractReinforcement learning and planning have been revolutionized in recent years, due in part to the mass adoption of deep convolutional neural networks and the resurgence of powerful methods to refine decision-making policies. However, the problem of sparse reward signals and their representation remains pervasive in many domains. While various rewardshaping mechanisms and imitation learning approaches have been proposed to mitigate this problem, the use of humanaided artificial rewards introduces human error, sub-optimal behavior, and a greater propensity for reward hacking. In this paper, we mitigate this by representing objectives as automata in order to define novel reward shaping functions over this structured representation. In doing so, we address the sparse rewards problem within a novel implementation of Monte Carlo Tree Search (MCTS) by proposing a reward shaping function which is updated dynamically to capture statistics on the utility of each automaton transition as it pertains to satisfying the goal of the agent. We further demonstrate that such automaton-guided reward shaping can be utilized to facilitate transfer learning between different environments when the objective is the same.
How to Cite
Velasquez, A., Bissey, B., Barak, L., Beckus, A., Alkhouri, I., Melcer, D., & Atia, G. (2021). Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 12015-12023. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17427
AAAI Technical Track on Planning, Routing, and Scheduling