Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits

Authors

  • Tianyi Xu Tulane University
  • Jiaxin Liu University of Illinois at Urbana-Champaign
  • Nicholas Mattei Tulane University
  • Zizhan Zheng Tulane University

DOI:

https://doi.org/10.1609/aaai.v40i32.39950

Abstract

We propose a multi-agent multi-armed bandit (MA-MAB) framework to ensure fair outcomes across agents while maximizing overall system performance. For example, in a ridesharing setting where a central dispatcher assigns drivers to distinct geographic regions, utilitarian welfare (the sum of driver earnings) can be highly skewed—some drivers may receive no rides. We instead measure fairness by Nash social welfare, i.e., the product of individual rewards. A key challenge in this setting is decision-making under limited information about arm rewards (geographic regions). To address this, we introduce a novel probing mechanism that strategically gathers information about selected arms before assignment. In the offline setting, where reward distributions are known, we exploit submodularity to design a greedy probing algorithm with a constant-factor approximation guarantee. In the online setting, we develop a probing-based algorithm that achieves sublinear regret while preserving Nash social welfare. Extensive experiments on synthetic and real-world datasets demonstrate that our approach outperforms baseline methods in both fairness and efficiency.

Published

2026-03-14

How to Cite

Xu, T., Liu, J., Mattei, N., & Zheng, Z. (2026). Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27332–27340. https://doi.org/10.1609/aaai.v40i32.39950

Issue

Section

AAAI Technical Track on Machine Learning IX