Inverse Reinforcement Learning From Like-Minded Teachers

Ritesh Noothigattu; Tom Yan; Ariel D. Procaccia

doi:10.1609/aaai.v35i10.17110

Inverse Reinforcement Learning From Like-Minded Teachers

Authors

Ritesh Noothigattu Carnegie Mellon University
Tom Yan Carnegie Mellon University
Ariel D. Procaccia Harvard University

DOI:

https://doi.org/10.1609/aaai.v35i10.17110

Keywords:

Imitation Learning & Inverse Reinforcement Learning

Abstract

We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their reward functions -- while different from each other -- are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property -- that of matching feature expectations -- yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. We also show how to efficiently recover the optimal policy when the MDP has one state -- a setting that is akin to multi-armed bandits.

Downloads

Published

2021-05-18

How to Cite

Noothigattu, R., Yan, T., & Procaccia, A. D. (2021). Inverse Reinforcement Learning From Like-Minded Teachers. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 9197-9204. https://doi.org/10.1609/aaai.v35i10.17110

Download Citation

Issue

Vol. 35 No. 10: AAAI-21 Technical Tracks 10

Section

AAAI Technical Track on Machine Learning III

Inverse Reinforcement Learning From Like-Minded Teachers

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription