TY - JOUR
AU - Ou, Mingdong
AU - Li, Nan
AU - Yang, Cheng
AU - Zhu, Shenghuo
AU - Jin, Rong
PY - 2019/07/17
Y2 - 2024/07/19
TI - Semi-Parametric Sampling for Stochastic Bandits with Many Arms
JF - Proceedings of the AAAI Conference on Artificial Intelligence
JA - AAAI
VL - 33
IS - 01
SE - AAAI Technical Track: Reasoning under Uncertainty
DO - 10.1609/aaai.v33i01.33017933
UR - https://ojs.aaai.org/index.php/AAAI/article/view/4793
SP - 7933-7940
AB - <p>We consider the stochastic bandit problem with a large candidate arm set. In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. By exploiting arm correlations based on a parametric reward model with arm features, contextual bandit algorithms are more efficient, but they can also suffer from large regret in practical applications, due to the reward estimation bias from mis-specified model assumption or incomplete features. In this paper, we propose a novel Bayesian framework, called <em>Semi-Parametric Sampling</em> (SPS), for this problem, which employs semi-parametric function as the reward model. Specifically, the parametric part of SPS, which models expected reward as a parametric function of arm feature, can efficiently eliminate poor arms from candidate set. The non-parametric part of SPS, which adopts nonparametric reward model, revises the parametric estimation to avoid estimation bias, especially on the remained candidate arms. We give an implementation of SPS, <em>Linear SPS</em> (LSPS), which utilizes linear function as the parametric part. In semi-parametric environment, theoretical analysis shows that LSPS achieves better regret bound (i.e. <em>O̴</em>(√<em>N</em><sup>1−<em>α</em></sup> d<em>α</em> √<em>T</em>) with <em>α</em> ∈ [0, 1])) than existing approaches. Also, experiments demonstrate the superiority of the proposed approach.</p>
ER -