A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Joel Q. L. Chang; Vincent Y. F. Tan

doi:10.1609/aaai.v36i6.20564

Authors

Joel Q. L. Chang Department of Mathematics, National University of Singapore
Vincent Y. F. Tan Department of Mathematics, National University of Singapore Department of Electrical and Computer Engineering, National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v36i6.20564

Keywords:

Machine Learning (ML)

Abstract

This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals ρ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm ρ-MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under the CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of ρ-MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-à-vis algorithm-independent lower bounds.

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription