A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs

Authors

  • Noah Patton University of Toronto
  • Jihwan Jeong University of Toronto
  • Mike Gimelfarb University of Toronto Vector Institute
  • Scott Sanner University of Toronto Vector Institute

DOI:

https://doi.org/10.1609/aaai.v36i9.21226

Keywords:

Planning, Routing, And Scheduling (PRS)

Abstract

Recent advances in efficient planning in deterministic or stochastic high-dimensional domains with continuous action spaces leverage backpropagation through a model of the environment to directly optimize action sequences. However, existing methods typically do not take risk into account when optimizing in stochastic domains, which can be incorporated efficiently in MDPs by optimizing a nonlinear utility function of the return distribution. We bridge this gap by introducing Risk-Aware Planning using PyTorch (RAPTOR), a novel unified framework for risk-sensitive planning through end-to-end optimization of commonly-studied risk-sensitive utility functions such as entropic utility, mean-variance optimization and CVaR. A key technical difficulty of our approach is that direct optimization of general risk-sensitive utility functions by backpropagation is impossible due to the presence of environment stochasticity. The novelty of RAPTOR lies in leveraging reparameterization of the state distribution, leading to a unique distributional perspective of end-to-end planning where the return distribution is utilized for sampling as well as optimizing risk-aware objectives by backpropagation in a unified framework. We evaluate and compare RAPTOR on three highly stochastic MDPs, including nonlinear navigation, HVAC control, and linear reservoir control, demonstrating the ability of RAPTOR to manage risk in complex continuous domains according to different notions of risk-sensitive utility.

Downloads

Published

2022-06-28

How to Cite

Patton, N., Jeong, J., Gimelfarb, M., & Sanner, S. (2022). A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9), 9894-9901. https://doi.org/10.1609/aaai.v36i9.21226

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling