Tree-Based On-Line Reinforcement Learning

Andre Barreto

doi:10.1609/aaai.v28i1.9063

Authors

Andre Barreto Brazilian National Laboratory for Scientific Computing (LNCC)

DOI:

https://doi.org/10.1609/aaai.v28i1.9063

Keywords:

Reinforcement Learning, Markov Decision Processes, Fitted Q-Iteration, Regression Trees, Stochastic Factorization

Abstract

Fitted Q-iteration (FQI) stands out among reinforcement learning algorithms for its flexibility and ease of use. FQI can be combined with any regression method, and this choice determines the algorithm's statistical and computational properties. The combination of FQI with an ensemble of regression trees gives rise to an algorithm, FQIT, that is computationally efficient, scalable to high dimensional spaces, and robust to noise. Despite its nice properties and good performance in practice, FQIT also has some limitations: the fact that an ensemble of trees must be constructed (or updated) at each iteration confines the algorithm to the batch scenario. This paper aims to address this specific issue. Based on a strategy recently proposed in the literature, called the stochastic-factorization trick, we propose a modification of FQIT that makes it fully incremental, and thus suitable for on-line learning. We call the resulting method tree-based stochastic factorization (TBSF). We derive upper bounds for the difference between the value functions computed by FQIT and TBSF, and also show in which circumstances the approximations coincide. A series of computational experiments is presented to illustrate the properties of TBSF and to show its usefulness in practice, including a medical problem involving the treatment of patients infected with HIV.

Tree-Based On-Line Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information