On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Authors

  • Thanh Nguyen-Tang Johns Hopkins University
  • Ming Yin UC Santa Barbara
  • Sunil Gupta Deakin University, Australia
  • Svetha Venkatesh Deakin University, Australia
  • Raman Arora Johns Hopkins University

DOI:

https://doi.org/10.1609/aaai.v37i8.26116

Keywords:

ML: Reinforcement Learning Theory

Abstract

Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.

Downloads

Published

2023-06-26

How to Cite

Nguyen-Tang, T., Yin, M., Gupta, S., Venkatesh, S., & Arora, R. (2023). On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9310-9318. https://doi.org/10.1609/aaai.v37i8.26116

Issue

Section

AAAI Technical Track on Machine Learning III