Inverse Reinforcement Learning From Like-Minded Teachers

Authors

  • Ritesh Noothigattu Carnegie Mellon University
  • Tom Yan Carnegie Mellon University
  • Ariel D. Procaccia Harvard University

DOI:

https://doi.org/10.1609/aaai.v35i10.17110

Keywords:

Imitation Learning & Inverse Reinforcement Learning

Abstract

We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their reward functions -- while different from each other -- are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property -- that of matching feature expectations -- yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. We also show how to efficiently recover the optimal policy when the MDP has one state -- a setting that is akin to multi-armed bandits.

Downloads

Published

2021-05-18

How to Cite

Noothigattu, R., Yan, T., & Procaccia, A. D. (2021). Inverse Reinforcement Learning From Like-Minded Teachers. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 9197-9204. https://doi.org/10.1609/aaai.v35i10.17110

Issue

Section

AAAI Technical Track on Machine Learning III