Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback

Xiang Liu; Hau Chan; Minming Li; Weiwei Wu; Long Tran-Thanh

doi:10.1609/aaai.v39i18.34089

Authors

Xiang Liu Southeast University The Chinese University of Hong Kong
Hau Chan University of Nebraska, Lincoln
Minming Li City University of Hong Kong
Weiwei Wu Southeast University
Long Tran-Thanh The University of Warwick

DOI:

https://doi.org/10.1609/aaai.v39i18.34089

Abstract

We consider a general non-stochastic online pricing bandit setting in a procurement scenario where a buyer with a budget wants to procure items from a fixed set of sellers to maximize the buyer's reward by dynamically offering purchasing prices to the sellers, where the sellers' costs and values at each time period can change arbitrarily and the sellers determine whether to accept the offered prices to sell the items. This setting models online pricing scenarios of procuring resources or services in multi-agent systems. We first consider the offline setting when sellers' costs and values are known in advance and investigate the best fixed-price policy in hindsight. We show that it has a tight approximation guarantee with respect to the offline optimal solutions. In the general online setting, we propose an online pricing policy, Granularity-based Pricing (GAP), which exploits underlying side-information from the feedback graph when the budget is given as the input. We show that GAP achieves an upper bound of O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln B) on the alpha-regret where n, v_{max}, c_{min}, and B are the number, the maximum value, the minimum cost of sellers, and the budget, respectively. We then extend it to the unknown budget case by developing a variant of GAP, namely Doubling-GAP, and show its alpha-regret is at most O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln2 B). We also provide an alpha-regret lower bound Omega(v_{max}sqrt{Bn/c_{min}}) of any online policy that is tight up to sub-linear terms. We conduct simulation experiments to show that the proposed policy outperforms the baseline algorithms.

Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information