Online Learning for Decentralized Multi-Agent Planning in Repeated Hedonic Skill Games

Jaber Valizadeh; Ray Telikani

doi:10.1609/icaps.v36i1.42844

Authors

Jaber Valizadeh School of Computing, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia
Ray Telikani Data Science Institute, University of Technology Sydney, Sydney, NSW, Australia

DOI:

https://doi.org/10.1609/icaps.v36i1.42844

Abstract

Coalition formation is a fundamental capability in decentralized multi-agent planning, where heterogeneous agents must coordinate to execute tasks whose feasibility depends on complementary capabilities. Hedonic Skill Games (HSGs) offer a compact model for representing such structured interactions, but existing results rely on the unrealistic assumption that agents possess full knowledge of both task requirements and the skills of their coalition members. This assumption breaks down in many planning domains, such as crowdsourced task allocation or distributed multi-robot task execution, where agents must plan under uncertainty and learn from partial observations. We introduce a repeated Hedonic Skill Games in which agents repeatedly form coalitions, execute feasible tasks, and receive only bandit feedback corresponding to their realized utilities. We develop a Upper Confidence Bound (UCB)-driven online learning algorithm that enables decentralized agents to jointly plan coalition choices despite incomplete information, balancing exploration of unknown coalitions with exploitation of realized utilities. We show that the resulting dynamics achieve sublinear Nash regret and converge to ε-approximate Nash equilibria. Experiments on synthetic and real-world problem instances demonstrate convergence behavior, improved social welfare, and the practicality of the approach for large-scale distributed planning scenarios.

Online Learning for Decentralized Multi-Agent Planning in Repeated Hedonic Skill Games

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information