Wang W, Guo H, Lv Z, Zhang S. A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models. AAAI [Internet]. 2026 Mar. 14 [cited 2026 May 25];40(40):33666-74. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/40656