On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Thanh Nguyen-Tang; Ming Yin; Sunil Gupta; Svetha Venkatesh; Raman Arora

doi:10.1609/aaai.v37i8.26116

Authors

Thanh Nguyen-Tang Johns Hopkins University
Ming Yin UC Santa Barbara
Sunil Gupta Deakin University, Australia
Svetha Venkatesh Deakin University, Australia
Raman Arora Johns Hopkins University

DOI:

https://doi.org/10.1609/aaai.v37i8.26116

Keywords:

ML: Reinforcement Learning Theory

Abstract

Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription