Context Tree Maximizing

Authors

  • Phuong Nguyen Australian National University, NICTA
  • Peter Sunehag Australian National University
  • Marcus Hutter Australian National University, NICTA, ETHZ

DOI:

https://doi.org/10.1609/aaai.v26i1.8310

Keywords:

Context Tree Maximizing, Context Tree, Markov Decision Process

Abstract

Recent developments in reinforcement learning for non-Markovianproblems witness a surge in history-based methods, among which weare particularly interested in two frameworks, PhiMDP and MC-AIXI-CTW. PhiMDP attempts to reduce the general RL problem, where the environment's states and dynamics are both unknown, toan MDP, while MC-AIXI-CTW incrementally learns a mixture of contexttrees as its environment model. The main idea of PhiMDP is toconnect generic reinforcement learning with classical reinforcementlearning. The first implementation of PhiMDP relies on astochastic search procedure for finding a tree that minimizes acertain cost function. This does not guarantee finding theminimizing tree, or even a good one, given limited search time. As aconsequence it appears that the approach has difficulties with largedomains. MC-AIXI-CTW is attractive in that it can incrementally andanalytically compute the internal model through interactions withthe environment. Unfortunately, it is computationally demanding dueto requiring heavy planning simulations at every single time step.We devise a novel approach called CTMRL, which analytically andefficiently finds the cost-minimizing tree. Instead of thecontext-tree weighting method that MC-AIXI-CTW is based on, we usethe closely related context-tree maximizing algorithm that selectsjust one single tree. This approach falls under the PhiMDPframework, which allows the replacement of the costly planningcomponent of MC-AIXI-CTW with simple Q-Learning. Our empiricalinvestigation show that CTMRL finds policies of quality as good as MC-AIXI-CTW's on sixdomains including a challenging Pacman domain, but in an order ofmagnitude less time.

Downloads

Published

2021-09-20

How to Cite

Nguyen, P., Sunehag, P., & Hutter, M. (2021). Context Tree Maximizing. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 1075–1082. https://doi.org/10.1609/aaai.v26i1.8310

Issue

Section

AAAI Technical Track: Machine Learning