Anagnostides, Ioannis, Ioannis Panageas, Gabriele Farina, and Tuomas Sandholm. “Optimistic Policy Gradient in Multi-Player Markov Games With a Single Controller: Convergence Beyond the Minty Property”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 9 (March 24, 2024): 9451–9459. Accessed May 30, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/28799.