Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback

Authors

  • Xiang Liu Southeast University The Chinese University of Hong Kong
  • Hau Chan University of Nebraska, Lincoln
  • Minming Li City University of Hong Kong
  • Weiwei Wu Southeast University
  • Long Tran-Thanh The University of Warwick

DOI:

https://doi.org/10.1609/aaai.v39i18.34089

Abstract

We consider a general non-stochastic online pricing bandit setting in a procurement scenario where a buyer with a budget wants to procure items from a fixed set of sellers to maximize the buyer's reward by dynamically offering purchasing prices to the sellers, where the sellers' costs and values at each time period can change arbitrarily and the sellers determine whether to accept the offered prices to sell the items. This setting models online pricing scenarios of procuring resources or services in multi-agent systems. We first consider the offline setting when sellers' costs and values are known in advance and investigate the best fixed-price policy in hindsight. We show that it has a tight approximation guarantee with respect to the offline optimal solutions. In the general online setting, we propose an online pricing policy, Granularity-based Pricing (GAP), which exploits underlying side-information from the feedback graph when the budget is given as the input. We show that GAP achieves an upper bound of O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln B) on the alpha-regret where n, v_{max}, c_{min}, and B are the number, the maximum value, the minimum cost of sellers, and the budget, respectively. We then extend it to the unknown budget case by developing a variant of GAP, namely Doubling-GAP, and show its alpha-regret is at most O(n{v_{max}}{c_{min}}sqrt{B/c_{min}}ln2 B). We also provide an alpha-regret lower bound Omega(v_{max}sqrt{Bn/c_{min}}) of any online policy that is tight up to sub-linear terms. We conduct simulation experiments to show that the proposed policy outperforms the baseline algorithms.

Downloads

Published

2025-04-11

How to Cite

Liu, X., Chan, H., Li, M., Wu, W., & Tran-Thanh, L. (2025). Non-stochastic Budgeted Online Pricing with Semi-Bandit Feedback. Proceedings of the AAAI Conference on Artificial Intelligence, 39(18), 18978–18986. https://doi.org/10.1609/aaai.v39i18.34089

Issue

Section

AAAI Technical Track on Machine Learning IV