Max-Min Grouped Bandits

Authors

  • Zhenlin Wang National University of Singapore
  • Jonathan Scarlett National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v36i8.20838

Keywords:

Machine Learning (ML)

Abstract

In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find a group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.

Downloads

Published

2022-06-28

How to Cite

Wang, Z., & Scarlett, J. (2022). Max-Min Grouped Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), 8603-8611. https://doi.org/10.1609/aaai.v36i8.20838

Issue

Section

AAAI Technical Track on Machine Learning III