Wu, C., Gan, Y., Xing, J. and Fu, Y. (2026) “MARPO: A Reflective Policy Optimization for Multi-Agent Reinforcement Learning”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(35), pp. 29740-29748. doi: 10.1609/aaai.v40i35.40219.