Inverse Reinforcement Learning From Like-Minded Teachers
DOI:
https://doi.org/10.1609/aaai.v35i10.17110Keywords:
Imitation Learning & Inverse Reinforcement LearningAbstract
We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their reward functions -- while different from each other -- are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property -- that of matching feature expectations -- yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. We also show how to efficiently recover the optimal policy when the MDP has one state -- a setting that is akin to multi-armed bandits.Downloads
Published
2021-05-18
How to Cite
Noothigattu, R., Yan, T., & Procaccia, A. D. (2021). Inverse Reinforcement Learning From Like-Minded Teachers. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 9197-9204. https://doi.org/10.1609/aaai.v35i10.17110
Issue
Section
AAAI Technical Track on Machine Learning III