Networked Restless Bandits with Positive Externalities

Authors

  • Christine Herlihy University of Maryland, College Park
  • John P. Dickerson University of Maryland, College Park

DOI:

https://doi.org/10.1609/aaai.v37i10.26415

Keywords:

PRS: Planning With Markov Models (MDPs, POMDPs), PRS: Planning Under Uncertainty

Abstract

Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition. Prior work assumes that individual arms only benefit if they receive the resource directly. However, many allocation tasks occur within communities and can be characterized by positive externalities that allow arms to derive partial benefit when their neighbor(s) receive the resource. We thus introduce networked restless bandits, a novel multi-armed bandit setting in which arms are both restless and embedded within a directed graph. We then present Greta, a graph-aware, Whittle index-based heuristic algorithm that can be used to efficiently construct a constrained reward-maximizing action vector at each timestep. Our empirical results demonstrate that Greta outperforms comparison policies across a range of hyperparameter values and graph topologies. Code and appendices are available at https://github.com/crherlihy/networked_restless_bandits.

Downloads

Published

2023-06-26

How to Cite

Herlihy, C., & Dickerson, J. P. (2023). Networked Restless Bandits with Positive Externalities. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10), 11997-12004. https://doi.org/10.1609/aaai.v37i10.26415

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling