Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Tianyi Xu; Jiaxin Liu; Nicholas Mattei; Zizhan Zheng

doi:10.1609/aaai.v40i32.39950

Authors

Tianyi Xu Tulane University
Jiaxin Liu University of Illinois at Urbana-Champaign
Nicholas Mattei Tulane University
Zizhan Zheng Tulane University

DOI:

https://doi.org/10.1609/aaai.v40i32.39950

Abstract

We propose a multi-agent multi-armed bandit (MA-MAB) framework to ensure fair outcomes across agents while maximizing overall system performance. For example, in a ridesharing setting where a central dispatcher assigns drivers to distinct geographic regions, utilitarian welfare (the sum of driver earnings) can be highly skewed—some drivers may receive no rides. We instead measure fairness by Nash social welfare, i.e., the product of individual rewards. A key challenge in this setting is decision-making under limited information about arm rewards (geographic regions). To address this, we introduce a novel probing mechanism that strategically gathers information about selected arms before assignment. In the offline setting, where reward distributions are known, we exploit submodularity to design a greedy probing algorithm with a constant-factor approximation guarantee. In the online setting, we develop a probing-based algorithm that achieves sublinear regret while preserving Nash social welfare. Extensive experiments on synthetic and real-world datasets demonstrate that our approach outperforms baseline methods in both fairness and efficiency.

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information