[1]

Wu, C., Gan, Y., Xing, J. and Fu, Y. 2026. MARPO: A Reflective Policy Optimization for Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 40, 35 (Mar. 2026), 29740-29748. DOI:https://doi.org/10.1609/aaai.v40i35.40219.