Stochastic Graphical Bandits with Adversarial Corruptions

Authors

  • Shiyin Lu Nanjing University
  • Guanghui Wang Nanjing University
  • Lijun Zhang Nanjing University

Keywords:

Online Learning & Bandits

Abstract

We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an $O(\alpha \ln{K} \ln{T} + \alpha C)$ regret, where $\alpha$ is the independence number of the feedback graph, $K$ is the number of arms, $T$ is the time horizon, and $C$ quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.

Downloads

Published

2021-05-18

How to Cite

Lu, S., Wang, G., & Zhang, L. (2021). Stochastic Graphical Bandits with Adversarial Corruptions. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8749-8757. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17060

Issue

Section

AAAI Technical Track on Machine Learning III