TY - JOUR
AU - Zhou, Huozhi
AU - Wang, Lingda
AU - Varshney, Lav
AU - Lim, Ee-Peng
PY - 2020/04/03
Y2 - 2024/02/24
TI - A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits
JF - Proceedings of the AAAI Conference on Artificial Intelligence
JA - AAAI
VL - 34
IS - 04
SE - AAAI Technical Track: Machine Learning
DO - 10.1609/aaai.v34i04.6176
UR - https://ojs.aaai.org/index.php/AAAI/article/view/6176
SP - 6933-6940
AB - <p>We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the <em>Generalized Likelihood Ratio Test</em> (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by <em>O</em>(√<em>NKT</em> log <em>T</em>), where <em>N</em> is the number of piecewise-stationary segments, <em>K</em> is the number of base arms, and <em>T</em> is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√<em>NKT</em>), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√<em>T</em>). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.</p>
ER -