Distance Minimization for Reward Learning from Scored Trajectories

Authors

  • Benjamin Burchfiel Duke University
  • Carlo Tomasi Duke University
  • Ronald Parr Duke University

DOI:

https://doi.org/10.1609/aaai.v30i1.10411

Keywords:

Reinforcement Learning, Robotics, Learning from Demonstration, Inverse Reinforcement Learning, Inverse Optimal Control, IRL, IOC, RL

Abstract

Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert’s role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.

Downloads

Published

2016-03-05

How to Cite

Burchfiel, B., Tomasi, C., & Parr, R. (2016). Distance Minimization for Reward Learning from Scored Trajectories. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10411