Multi-agent In-context Coordination via Decentralized Memory Retrieval

Authors

  • Tao Jiang National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Zichuan Lin Tencent, Shenzhen, China
  • Lihe Li National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Yi-Chen Li National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Cong Guan National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Lei Yuan National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Zongzhang Zhang National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Yang Yu National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Artificial Intelligence, Nanjing University, Nanjing, China
  • Deheng Ye Tencent, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v40i27.39394

Abstract

Large transformer models, trained on diverse datasets, have demonstrated impressive few-shot performance on previously unseen tasks without requiring parameter updates. This capability has also been explored in Reinforcement Learning (RL), where agents interact with the environment to retrieve context and maximize cumulative rewards, showcasing strong adaptability in complex settings. However, in cooperative Multi-Agent Reinforcement Learning (MARL), where agents must coordinate toward a shared goal, decentralized policy deployment can lead to mismatches in task alignment and reward assignment, limiting the efficiency of policy adaptation. To address this challenge, we introduce Multi-agent In-context Coordination via Decentralized Memory Retrieval (MAICC), a novel approach designed to enhance coordination by fast adaptation. Our method involves training a centralized embedding model to capture fine-grained trajectory representations, followed by decentralized models that approximate the centralized one to obtain team-level task information. Based on the learned embeddings, relevant trajectories are retrieved as context, which, combined with the agents' current sub-trajectories, inform decision-making. During decentralized execution, we introduce a novel memory mechanism that effectively balances test-time online data with offline memory. Based on the constructed memory, we propose a hybrid utility score that incorporates both individual- and team-level returns, ensuring credit assignment across agents. Extensive experiments on cooperative MARL benchmarks, including Level-Based Foraging (LBF) and SMAC (v1/v2), show that MAICC enables faster adaptation to unseen tasks compared to existing methods.

Published

2026-03-14

How to Cite

Jiang, T., Lin, Z., Li, L., Li, Y.-C., Guan, C., Yuan, L., Zhang, Z., Yu, Y., & Ye, D. (2026). Multi-agent In-context Coordination via Decentralized Memory Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 22363-22371. https://doi.org/10.1609/aaai.v40i27.39394

Issue

Section

AAAI Technical Track on Machine Learning IV