Robust Contextual Bandits via Bootstrapping

Qiao Tang; Hong Xie; Yunni Xia; Jia Lee; Qingsheng Zhu

doi:10.1609/aaai.v35i13.17446

Authors

Qiao Tang Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
Hong Xie Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
Yunni Xia Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
Jia Lee Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
Qingsheng Zhu Chongqing Key Laboratory of Software Theory and Technology, Chongqing University

DOI:

https://doi.org/10.1609/aaai.v35i13.17446

Keywords:

Sequential Decision Making

Abstract

Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a second order correction for our estimator so as to obtain its confidence level with a finite number of rounds. To demonstrate the versatility of the estimator, we apply it to design a BootLinUCB algorithm for the contextual bandit. We prove that the BootLinUCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance.

Robust Contextual Bandits via Bootstrapping

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information