Multi-Armed Bandit with Budget Constraint and Variable Costs

Wenkui Ding; Tao Qin; Xu-Dong Zhang; Tie-Yan Liu

doi:10.1609/aaai.v27i1.8637

Authors

Wenkui Ding Tsinghua University
Tao Qin Microsoft Research Asia
Xu-Dong Zhang Tsinghua University
Tie-Yan Liu Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v27i1.8637

Keywords:

multi-armed bandit, online learning, budget constraint, variable costs

Abstract

We study the multi-armed bandit problems with budget constraint and variable costs (MAB-BV). In this setting, pulling an arm will receive a random reward together with a random cost, and the objective of an algorithm is to pull a sequence of arms in order to maximize the expected total reward with the costs of pulling those arms complying with a budget constraint. This new setting models many Internet applications (e.g., ad exchange, sponsored search, and cloud computing) in a more accurate manner than previous settings where the pulling of arms is either costless or with a fixed cost. We propose two UCB based algorithms for the new setting. The first algorithm needs prior knowledge about the lower bound of the expected costs when computing the exploration term. The second algorithm eliminates this need by estimating the minimal expected costs from empirical observations, and therefore can be applied to more real-world applications where prior knowledge is not available. We prove that both algorithms have nice learning abilities, with regret bounds of O(ln B). Furthermore, we show that when applying our proposed algorithms to a previous setting with fixed costs (which can be regarded as our special case), one can improve the previously obtained regret bound. Our simulation results on real-time bidding in ad exchange verify the effectiveness of the algorithms and are consistent with our theoretical analysis.

Multi-Armed Bandit with Budget Constraint and Variable Costs

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information