Wu, C., Y. Gan, J. Xing, and Y. Fu. “MARPO: A Reflective Policy Optimization for Multi-Agent Reinforcement Learning”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 35, Mar. 2026, pp. 29740-8, doi:10.1609/aaai.v40i35.40219.