Wang, W., Guo, H., Lv, Z., & Zhang, S. (2026). A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 33666–33674. https://doi.org/10.1609/aaai.v40i40.40656