TY - JOUR
AU - Lu, Shiyin
AU - Wang, Guanghui
AU - Zhang, Lijun
PY - 2021/05/18
Y2 - 2024/04/14
TI - Stochastic Graphical Bandits with Adversarial Corruptions
JF - Proceedings of the AAAI Conference on Artificial Intelligence
JA - AAAI
VL - 35
IS - 10
SE - AAAI Technical Track on Machine Learning III
DO - 10.1609/aaai.v35i10.17060
UR - https://ojs.aaai.org/index.php/AAAI/article/view/17060
SP - 8749-8757
AB - We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an $O(\alpha \ln{K} \ln{T} + \alpha C)$ regret, where $\alpha$ is the independence number of the feedback graph, $K$ is the number of arms, $T$ is the time horizon, and $C$ quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.
ER -