Online Learning for Decentralized Multi-Agent Planning in Repeated Hedonic Skill Games

Authors

  • Jaber Valizadeh School of Computing, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia
  • Ray Telikani Data Science Institute, University of Technology Sydney, Sydney, NSW, Australia

DOI:

https://doi.org/10.1609/icaps.v36i1.42844

Abstract

Coalition formation is a fundamental capability in decentralized multi-agent planning, where heterogeneous agents must coordinate to execute tasks whose feasibility depends on complementary capabilities. Hedonic Skill Games (HSGs) offer a compact model for representing such structured interactions, but existing results rely on the unrealistic assumption that agents possess full knowledge of both task requirements and the skills of their coalition members. This assumption breaks down in many planning domains, such as crowdsourced task allocation or distributed multi-robot task execution, where agents must plan under uncertainty and learn from partial observations. We introduce a repeated Hedonic Skill Games in which agents repeatedly form coalitions, execute feasible tasks, and receive only bandit feedback corresponding to their realized utilities. We develop a Upper Confidence Bound (UCB)-driven online learning algorithm that enables decentralized agents to jointly plan coalition choices despite incomplete information, balancing exploration of unknown coalitions with exploitation of realized utilities. We show that the resulting dynamics achieve sublinear Nash regret and converge to ε-approximate Nash equilibria. Experiments on synthetic and real-world problem instances demonstrate convergence behavior, improved social welfare, and the practicality of the approach for large-scale distributed planning scenarios.

Downloads

Published

2026-06-08

How to Cite

Valizadeh, J., & Telikani, R. (2026). Online Learning for Decentralized Multi-Agent Planning in Repeated Hedonic Skill Games. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 342–350. https://doi.org/10.1609/icaps.v36i1.42844