Improving Approximate Value Iteration with Complex Returns by Bounding

Authors

  • Robert Wright Air Force Research Laboratory - Information Directorate and ¬†Binghamton University
  • Xingye Qiao Binghamton University
  • Steven Loscalzo Air Force Research Laboratory - Information Directorate
  • Lei Yu Binghamton University

DOI:

https://doi.org/10.1609/aaai.v29i1.9568

Keywords:

Reinforcement Learning, Approximate Value Iteration, Complex Returns, Off-Policy

Abstract

Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the $n$-step returns. We propose a bounding method which uses a negatively biased but relatively low variance estimator generated from a complex return to provide a lower bound on the observed value of a traditional one-step return estimator. In addition, we develop a new Bounded FQI algorithm, which efficiently incorporates the bounding method into an AVI framework. Experiments show that our method produces more accurate value estimates than existing approaches, resulting in improved policies.

Downloads

Published

2015-02-21

How to Cite

Wright, R., Qiao, X., Loscalzo, S., & Yu, L. (2015). Improving Approximate Value Iteration with Complex Returns by Bounding. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9568

Issue

Section

Main Track: Novel Machine Learning Algorithms