Max-Min Grouped Bandits

Zhenlin Wang; Jonathan Scarlett

doi:10.1609/aaai.v36i8.20838

Authors

Zhenlin Wang National University of Singapore
Jonathan Scarlett National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v36i8.20838

Keywords:

Machine Learning (ML)

Abstract

In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find a group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.

Max-Min Grouped Bandits

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription