Jia, W., J. Lu, H. Yu, S. Wang, G. Tang, A.-L. Wang, W. Yin, D. Yang, Y. Nie, B. Shan, H. Feng, I. Li, K. Yang, H. Wang, J. Tang, T. Fu, C. Jin, C. Feng, X. Lv, and C. Huang. “MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 37, Mar. 2026, pp. 31283-91, doi:10.1609/aaai.v40i37.40391.