Stochastic Bandits with Graph Feedback in Non-Stationary Environments

Authors

  • Shiyin Lu Nanjing University
  • Yao Hu Alibaba Youku Cognitive and Intelligent Lab
  • Lijun Zhang Nanjing University

DOI:

https://doi.org/10.1609/aaai.v35i10.17061

Keywords:

Online Learning & Bandits

Abstract

We study a variant of stochastic bandits where the feedback model is specified by a graph. In this setting, after playing an arm, one can observe rewards of not only the played arm but also other arms that are adjacent to the played arm in the graph. Most of the existing work assumes the reward distributions are stationary over time, which, however, is often violated in common scenarios such as recommendation systems and online advertising. To address this limitation, we study stochastic bandits with graph feedback in non-stationary environments and propose algorithms with graph-dependent dynamic regret bounds. When the number of reward distribution changes L is known in advance, one of our algorithms achieves an Õ(√(αLT)) dynamic regret bound. We also develop an adaptive algorithm that can adapt to unknown L and attain an Õ(√(θLT)) dynamic regret. Here, α and θ are some graph-dependent quantities and T is the time horizon.

Downloads

Published

2021-05-18

How to Cite

Lu, S., Hu, Y., & Zhang, L. (2021). Stochastic Bandits with Graph Feedback in Non-Stationary Environments. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8758-8766. https://doi.org/10.1609/aaai.v35i10.17061

Issue

Section

AAAI Technical Track on Machine Learning III