Reward-Weighted Regression Converges to a Global Optimum

Authors

  • Miroslav Štrupl The Swiss AI Lab IDSIA, USI, SUPSI
  • Francesco Faccio The Swiss AI Lab IDSIA, USI, SUPSI
  • Dylan R. Ashley The Swiss AI Lab IDSIA, USI, SUPSI
  • Rupesh Kumar Srivastava NNAISENSE
  • Jürgen Schmidhuber The Swiss AI Lab IDSIA, USI, SUPSI NNAISENSE King Abdullah University of Science and Technology (KAUST)

DOI:

https://doi.org/10.1609/aaai.v36i8.20811

Keywords:

Machine Learning (ML)

Abstract

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used, in a general compact setting. Furthermore, for the simpler case with finite state and action spaces we prove R-linear convergence of the state-value function to the optimum.

Downloads

Published

2022-06-28

How to Cite

Štrupl, M., Faccio, F., Ashley, D. R., Srivastava, R. K., & Schmidhuber, J. (2022). Reward-Weighted Regression Converges to a Global Optimum. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), 8361-8369. https://doi.org/10.1609/aaai.v36i8.20811

Issue

Section

AAAI Technical Track on Machine Learning III