The Sample Complexity of Teaching by Reinforcement on Q-Learning

Xuezhou Zhang; Shubham Bharti; Yuzhe Ma; Adish Singla; Xiaojin Zhu

doi:10.1609/aaai.v35i12.17306

Authors

Xuezhou Zhang University of Wisconsin-Madison
Shubham Bharti University of Wisconsin-Madison
Yuzhe Ma University of Wisconsin-Madison
Adish Singla MPI-SWS
Xiaojin Zhu University of Wisconsin-Madison

DOI:

https://doi.org/10.1609/aaai.v35i12.17306

Keywords:

Reinforcement Learning

Abstract

We study the sample complexity of teaching, termed as ``teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the teaching-by-demonstration paradigm motivated by robotics applications, where the teacher teaches by providing demonstrations of state/action trajectories. The teaching-by-reinforcement paradigm applies to a wider range of real-world settings where a demonstration is inconvenient, but has not been studied systematically. In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment, and present matching optimal teaching algorithms. Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results. Our teaching algorithms have the potential to speed up RL agent learning in applications where a helpful teacher is available.

The Sample Complexity of Teaching by Reinforcement on Q-Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription