Semi-Parametric Sampling for Stochastic Bandits with Many Arms

Mingdong Ou; Nan Li; Cheng Yang; Shenghuo Zhu; Rong Jin

doi:10.1609/aaai.v33i01.33017933

Authors

Mingdong Ou Alibaba
Nan Li Alibaba Group
Cheng Yang Alibaba Group
Shenghuo Zhu Alibaba Group
Rong Jin Alibaba Group

DOI:

https://doi.org/10.1609/aaai.v33i01.33017933

Abstract

We consider the stochastic bandit problem with a large candidate arm set. In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. By exploiting arm correlations based on a parametric reward model with arm features, contextual bandit algorithms are more efficient, but they can also suffer from large regret in practical applications, due to the reward estimation bias from mis-specified model assumption or incomplete features. In this paper, we propose a novel Bayesian framework, called Semi-Parametric Sampling (SPS), for this problem, which employs semi-parametric function as the reward model. Specifically, the parametric part of SPS, which models expected reward as a parametric function of arm feature, can efficiently eliminate poor arms from candidate set. The non-parametric part of SPS, which adopts nonparametric reward model, revises the parametric estimation to avoid estimation bias, especially on the remained candidate arms. We give an implementation of SPS, Linear SPS (LSPS), which utilizes linear function as the parametric part. In semi-parametric environment, theoretical analysis shows that LSPS achieves better regret bound (i.e. O̴(√N^1−α dα √T) with α ∈ [0, 1])) than existing approaches. Also, experiments demonstrate the superiority of the proposed approach.

Semi-Parametric Sampling for Stochastic Bandits with Many Arms

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information