Inverse Reinforcement Learning through Policy Gradient Minimization

Authors

  • Matteo Pirotta Politecnico di Milano
  • Marcello Restelli Politecnico di Milano

DOI:

https://doi.org/10.1609/aaai.v30i1.10313

Keywords:

Reinforcement Learning, Inverse Reinforcement Learning

Abstract

Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert's policy.Most IRL algorithms need to repeatedly compute the optimal policy for different reward functions.This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any "direct" RL problem.The idea is to find the reward function that minimizes the gradient of a parameterized representation of the expert's policy.In particular, when the reward function can be represented as a linear combination of some basis functions, we will show that the aforementioned optimization problem can be efficiently solved.We present an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator (LQR) both in the case where the parameters of the expert's policy are known and in the (more realistic) case where the parameters of the expert's policy need to be inferred from the expert's demonstrations.Finally, the algorithm is compared against the state-of-the-art on the mountain car domain, where the expert's policy is unknown.

Downloads

Published

2016-03-02

How to Cite

Pirotta, M., & Restelli, M. (2016). Inverse Reinforcement Learning through Policy Gradient Minimization. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10313

Issue

Section

Technical Papers: Machine Learning Methods