On Value Function Representation of Long Horizon Problems

Lucas Lehnert; Romain Laroche; Harm van Seijen

doi:10.1609/aaai.v32i1.11646

Authors

Lucas Lehnert Brown University, Providence, Rhode Island
Romain Laroche Microsoft Maluuba, Montreal, QC
Harm van Seijen Microsoft Maluuba, Montreal, QC

DOI:

https://doi.org/10.1609/aaai.v32i1.11646

Keywords:

Reinforcement Learning

Abstract

In Reinforcement Learning, an intelligent agent has to make a sequence of decisions to accomplish a goal. If this sequence is long, then the agent has to plan over a long horizon. While learning the optimal policy and its value function is a well studied problem in Reinforcement Learning, this paper focuses on the structure of the optimal value function and how hard it is to represent the optimal value function. We show that the generalized Rademacher complexity of the hypothesis space of all optimal value functions is dependent on the planning horizon and independent of the state and action space size. Further, we present bounds on the action-gaps of action value functions and show that they can collapse if a long planning horizon is used. The theoretical results are verified empirically on randomly generated MDPs and on a grid-world fruit collection task using deep value function approximation. Our theoretical results highlight a connection between value function approximation and the Options framework and suggest that value functions should be decomposed along bottlenecks of the MDP's transition dynamics.

On Value Function Representation of Long Horizon Problems

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information