Max-Min Grouped Bandits
DOI:
https://doi.org/10.1609/aaai.v36i8.20838Keywords:
Machine Learning (ML)Abstract
In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find a group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.Downloads
Published
2022-06-28
How to Cite
Wang, Z., & Scarlett, J. (2022). Max-Min Grouped Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), 8603-8611. https://doi.org/10.1609/aaai.v36i8.20838
Issue
Section
AAAI Technical Track on Machine Learning III