Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
DOI:
https://doi.org/10.1609/aaai.v40i32.39950Abstract
We propose a multi-agent multi-armed bandit (MA-MAB) framework to ensure fair outcomes across agents while maximizing overall system performance. For example, in a ridesharing setting where a central dispatcher assigns drivers to distinct geographic regions, utilitarian welfare (the sum of driver earnings) can be highly skewed—some drivers may receive no rides. We instead measure fairness by Nash social welfare, i.e., the product of individual rewards. A key challenge in this setting is decision-making under limited information about arm rewards (geographic regions). To address this, we introduce a novel probing mechanism that strategically gathers information about selected arms before assignment. In the offline setting, where reward distributions are known, we exploit submodularity to design a greedy probing algorithm with a constant-factor approximation guarantee. In the online setting, we develop a probing-based algorithm that achieves sublinear regret while preserving Nash social welfare. Extensive experiments on synthetic and real-world datasets demonstrate that our approach outperforms baseline methods in both fairness and efficiency.Downloads
Published
2026-03-14
How to Cite
Xu, T., Liu, J., Mattei, N., & Zheng, Z. (2026). Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27332–27340. https://doi.org/10.1609/aaai.v40i32.39950
Issue
Section
AAAI Technical Track on Machine Learning IX