Inverse Reinforcement Learning through Policy Gradient Minimization

Matteo Pirotta; Marcello Restelli

doi:10.1609/aaai.v30i1.10313

Authors

Matteo Pirotta Politecnico di Milano
Marcello Restelli Politecnico di Milano

DOI:

https://doi.org/10.1609/aaai.v30i1.10313

Keywords:

Reinforcement Learning, Inverse Reinforcement Learning

Abstract

Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert's policy.Most IRL algorithms need to repeatedly compute the optimal policy for different reward functions.This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any "direct" RL problem.The idea is to find the reward function that minimizes the gradient of a parameterized representation of the expert's policy.In particular, when the reward function can be represented as a linear combination of some basis functions, we will show that the aforementioned optimization problem can be efficiently solved.We present an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator (LQR) both in the case where the parameters of the expert's policy are known and in the (more realistic) case where the parameters of the expert's policy need to be inferred from the expert's demonstrations.Finally, the algorithm is compared against the state-of-the-art on the mountain car domain, where the expert's policy is unknown.

Inverse Reinforcement Learning through Policy Gradient Minimization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information