Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming

Authors

  • Jason Pazis Duke University
  • Ronald Parr Duke University

DOI:

https://doi.org/10.1609/aaai.v27i1.8696

Keywords:

Reinforcement Learning, Approximate Linear Programming

Abstract

One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in non-parametric approximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.

Downloads

Published

2013-06-30

How to Cite

Pazis, J., & Parr, R. (2013). Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming. Proceedings of the AAAI Conference on Artificial Intelligence, 27(1), 782-788. https://doi.org/10.1609/aaai.v27i1.8696