Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

Authors

  • Hong Xie University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence
  • Haoran Gu Daqing Oilfield Chongqing Company
  • Yanying Huang College of Computer Science, Chongqing University
  • Tao Tan University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence
  • Defu Lian University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence

DOI:

https://doi.org/10.1609/aaai.v40i43.41002

Abstract

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of finite number of arms and plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds are proved, revealing the impact of model parameters on the hardness of learning the optimal allocation policy. When model parameters are given, we design an algorithm named MSB-PRS-OffOpt to locate the optimal play allocation policy with a polynomial computational complexity in the number of arms and plays. Utilizing MSB-PRS-OffOpt as a subroutine, an approximate upper confidence bound (UCB) based algorithm is designed, which has instance independent and instance dependent regret upper bounds matching the corresponding lower bound up to acceptable factors. To this end, we address nontrivial technical challenges arising from optimizing and learning under a special nonlinear combinatorial utility function induced by the prioritized resource sharing mechanism.

Downloads

Published

2026-03-14

How to Cite

Xie, H., Gu, H., Huang, Y., Tan, T., & Lian, D. (2026). Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36766–36774. https://doi.org/10.1609/aaai.v40i43.41002

Issue

Section

AAAI Technical Track on Reasoning under Uncertainty