TY - JOUR AU - Korkut, Melda AU - Li, Andrew PY - 2021/05/18 Y2 - 2024/03/28 TI - Disposable Linear Bandits for Online Recommendations JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 35 IS - 5 SE - AAAI Technical Track on Data Mining and Knowledge Management DO - 10.1609/aaai.v35i5.16540 UR - https://ojs.aaai.org/index.php/AAAI/article/view/16540 SP - 4172-4180 AB - We study the classic stochastic linear bandit problem under the restriction that each arm may be selected for limited number of times. This simple constraint, which we call disposability, captures a common restriction that occurs in recommendation problems from a diverse array of applications ranging from personalized styling services to dating platforms. We show that the regret for this problem is characterized by a previously-unstudied function of the reward distribution among optimal arms. Algorithmically, our upper bound relies on an optimism-based policy which, while computationally intractable, lends itself to approximation via a fast alternating heuristic initialized with a classic similarity score. Experiments show that our policy dominates a set of benchmarks which includes algorithms known to be optimal for the linear bandit without disposability, along with natural modifications to these algorithms for the disposable setting. ER -