Knapsack Based Optimal Policies for Budget–Limited Multi–Armed Bandits

Authors

  • Long Tran-Thanh University of Southampton
  • Archie Chapman The University of Sydney
  • Alex Rogers University of Southampton
  • Nicholas Jennings University of Southampton

DOI:

https://doi.org/10.1609/aaai.v26i1.8279

Keywords:

sequential decision making under uncertainty, multi-armed bandits, cost budget

Abstract

In budget–limited multi–armed bandit (MAB) problems, thelearner’s actions are costly and constrained by a fixed budget.Consequently, an optimal exploitation policy may not be topull the optimal arm repeatedly, as is the case in other variantsof MAB, but rather to pull the sequence of different arms thatmaximises the agent’s total reward within the budget. Thisdifference from existing MABs means that new approachesto maximising the total reward are required. Given this, wedevelop two pulling policies, namely: (i) KUBE; and (ii)fractional KUBE. Whereas the former provides better performanceup to 40% in our experimental settings, the latteris computationally less expensive. We also prove logarithmicupper bounds for the regret of both policies, and show thatthese bounds are asymptotically optimal (i.e. they only differfrom the best possible regret by a constant factor).

Downloads

Published

2021-09-20

How to Cite

Tran-Thanh, L., Chapman, A., Rogers, A., & Jennings, N. (2021). Knapsack Based Optimal Policies for Budget–Limited Multi–Armed Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 1134-1140. https://doi.org/10.1609/aaai.v26i1.8279

Issue

Section

AAAI Technical Track: Machine Learning