Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Authors

  • Guojun Xiong SUNY-Binghamton University
  • Jian Li SUNY-Binghamton University

DOI:

https://doi.org/10.1609/aaai.v37i9.26251

Keywords:

ML: Online Learning & Bandits

Abstract

Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have full access to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have limited access to a dynamic local subset of arms (i.e., an arm could sometimes be ``walking'' and not accessible to the player). To this end, this paper proposes a multi-player multi-armed walking bandits model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local subset and only collect a full reward if no other players pull the same arm. We adopt Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff and employ distributed optimization techniques to properly handle collisions. By carefully integrating these two techniques, we propose a decentralized algorithm with near-optimal guarantee on the regret, and can be easily implemented to obtain competitive empirical performance.

Downloads

Published

2023-06-26

How to Cite

Xiong, G., & Li, J. (2023). Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 10528-10536. https://doi.org/10.1609/aaai.v37i9.26251

Issue

Section

AAAI Technical Track on Machine Learning IV