Improving Approximate Value Iteration with Complex Returns by Bounding

Robert Wright; Xingye Qiao; Steven Loscalzo; Lei Yu

doi:10.1609/aaai.v29i1.9568

Authors

Robert Wright Air Force Research Laboratory - Information Directorate and Binghamton University
Xingye Qiao Binghamton University
Steven Loscalzo Air Force Research Laboratory - Information Directorate
Lei Yu Binghamton University

DOI:

https://doi.org/10.1609/aaai.v29i1.9568

Keywords:

Reinforcement Learning, Approximate Value Iteration, Complex Returns, Off-Policy

Abstract

Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the $n$-step returns. We propose a bounding method which uses a negatively biased but relatively low variance estimator generated from a complex return to provide a lower bound on the observed value of a traditional one-step return estimator. In addition, we develop a new Bounded FQI algorithm, which efficiently incorporates the bounding method into an AVI framework. Experiments show that our method produces more accurate value estimates than existing approaches, resulting in improved policies.

Improving Approximate Value Iteration with Complex Returns by Bounding

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription