Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning via Predictive Observation Imputation (Abstract Reprint)

Authors

  • Pedro P. Santos Artificial Intelligence for People and Society (GAIPS), INESC-ID Instituto Superior Técnico, University of Lisbon
  • Diogo S. Carvalho Artificial Intelligence for People and Society (GAIPS), INESC-ID Instituto Superior Técnico, University of Lisbon
  • Miguel Vasco KTH Royal Institute of Technology
  • Alberto Sardinha Pontifical Catholic University of Rio de Janeiro
  • Pedro A. Santos Artificial Intelligence for People and Society (GAIPS), INESC-ID Instituto Superior Técnico, University of Lisbon
  • Ana Paiva Artificial Intelligence for People and Society (GAIPS), INESC-ID Instituto Superior Técnico, University of Lisbon
  • Francisco S. Melo Artificial Intelligence for People and Society (GAIPS), INESC-ID Instituto Superior Técnico, University of Lisbon

DOI:

https://doi.org/10.1609/aaai.v40i47.41409

Abstract

We study hybrid execution in multi-agent reinforcement learning (MARL), a paradigm where agents aim to complete cooperative tasks with arbitrary communication levels at execution time by taking advantage of information-sharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized), but the agents do not know beforehand which communication level they will encounter at execution time. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations at execution time. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the impact of partial observability in MARL. Experimental results show that our method consistently outperforms relevant baselines, allowing agents to act with faulty communication while successfully exploiting shared information.

Published

2026-03-14

How to Cite

Santos, P. P., Carvalho, D. S., Vasco, M., Sardinha, A., Santos, P. A., Paiva, A., & Melo, F. S. (2026). Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning via Predictive Observation Imputation (Abstract Reprint). Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39894–39894. https://doi.org/10.1609/aaai.v40i47.41409