Distance Minimization for Reward Learning from Scored Trajectories


  • Benjamin Burchfiel Duke University
  • Carlo Tomasi Duke University
  • Ronald Parr Duke University




Reinforcement Learning, Robotics, Learning from Demonstration, Inverse Reinforcement Learning, Inverse Optimal Control, IRL, IOC, RL


Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert’s role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.




How to Cite

Burchfiel, B., Tomasi, C., & Parr, R. (2016). Distance Minimization for Reward Learning from Scored Trajectories. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10411