A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Authors

  • Joel Q. L. Chang Department of Mathematics, National University of Singapore
  • Vincent Y. F. Tan Department of Mathematics, National University of Singapore Department of Electrical and Computer Engineering, National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v36i6.20564

Keywords:

Machine Learning (ML)

Abstract

This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals ρ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm ρ-MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under the CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of ρ-MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-à-vis algorithm-independent lower bounds.

Downloads

Published

2022-06-28

How to Cite

Chang, J. Q. L., & Tan, V. Y. F. (2022). A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 36(6), 6159-6166. https://doi.org/10.1609/aaai.v36i6.20564

Issue

Section

AAAI Technical Track on Machine Learning I