Robust Contextual Bandits via Bootstrapping

Authors

  • Qiao Tang Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
  • Hong Xie Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
  • Yunni Xia Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
  • Jia Lee Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
  • Qingsheng Zhu Chongqing Key Laboratory of Software Theory and Technology, Chongqing University

DOI:

https://doi.org/10.1609/aaai.v35i13.17446

Keywords:

Sequential Decision Making

Abstract

Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a second order correction for our estimator so as to obtain its confidence level with a finite number of rounds. To demonstrate the versatility of the estimator, we apply it to design a BootLinUCB algorithm for the contextual bandit. We prove that the BootLinUCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance.

Downloads

Published

2021-05-18

How to Cite

Tang, Q., Xie, H., Xia, Y., Lee, J., & Zhu, Q. (2021). Robust Contextual Bandits via Bootstrapping. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 12182-12189. https://doi.org/10.1609/aaai.v35i13.17446

Issue

Section

AAAI Technical Track on Reasoning under Uncertainty