Robust Contextual Bandits via Bootstrapping
Keywords:Sequential Decision Making
AbstractUpper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a second order correction for our estimator so as to obtain its confidence level with a finite number of rounds. To demonstrate the versatility of the estimator, we apply it to design a BootLinUCB algorithm for the contextual bandit. We prove that the BootLinUCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance.
How to Cite
Tang, Q., Xie, H., Xia, Y., Lee, J., & Zhu, Q. (2021). Robust Contextual Bandits via Bootstrapping. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 12182-12189. https://doi.org/10.1609/aaai.v35i13.17446
AAAI Technical Track on Reasoning under Uncertainty