Optimizing the CVaR via Sampling

Aviv Tamar; Yonatan Glassner; Shie Mannor

doi:10.1609/aaai.v29i1.9561

Optimizing the CVaR via Sampling

Authors

Aviv Tamar Technion
Yonatan Glassner Technion
Shie Mannor Technion

DOI:

https://doi.org/10.1609/aaai.v29i1.9561

Keywords:

CVaR, Likelihood Ratio Method, Reinforcement Learning, MDP

Abstract

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.

Downloads

Published

2015-02-21

How to Cite

Tamar, A., Glassner, Y., & Mannor, S. (2015). Optimizing the CVaR via Sampling. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9561

Download Citation

Issue

Vol. 29 No. 1 (2015): Twenty-Ninth AAAI Conference on Artificial Intelligence

Section

Main Track: Novel Machine Learning Algorithms

Optimizing the CVaR via Sampling

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription