Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected

Authors

  • Aditya Gopalan Indian Institute of Science
  • Prashanth L. A. University of Maryland
  • Michael Fu University of Maryland
  • Steve Marcus University of Maryland

DOI:

https://doi.org/10.1609/aaai.v31i1.10922

Keywords:

Multi-armed bandits, Linear Bandits, Cumulative Prospect Theory

Abstract

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions: the classic K-armed bandit and the linearly parameterized bandit. In both settings, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate cost distortions, and exhibit sublinear regret assuming Holder continuous weight distortion functions. For the K-armed setting, we show that the algorithm, called W-UCB, achieves problem-dependent regret O(L2M2 log n / Δ(2/α – 1), where n is the number of plays, Δ is the gap in distorted expected value between the best and next best arm, L and alpha are the Holder constants for the distortion function, and M is an upper bound on costs, and a problem-independent regret bound of O((KL2M2)(α/2)n(2 – α)/2)). We also present a matching lower bound on the regret, showing that the regret of W-UCB is essentially unimprovable over the class of Holder-continuous weight distortions. For the linearly parameterized setting, we develop a new algorithm, a variant of the Optimism in the Face of Uncertainty Linear bandit (OFUL) algorithm called WOFUL (Weight-distorted OFUL), and show that it has regret O(dn polylog(n)) with high probability, for sub-Gaussian cost distributions.

Downloads

Published

2017-02-13

How to Cite

Gopalan, A., L. A., P., Fu, M., & Marcus, S. (2017). Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10922