[1]

I. Anagnostides, I. Panageas, G. Farina, and T. Sandholm, “Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property”, AAAI, vol. 38, no. 9, pp. 9451–9459, Mar. 2024.