[1]
W. Wang, H. Guo, Z. Lv, and S. Zhang, “A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models”, AAAI, vol. 40, no. 40, pp. 33666–33674, Mar. 2026.