Chen, Junyi, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, and Dongyu Zhang. “EVE: Efficient Vision-Language Pre-Training With Masked Prediction and Modality-Aware MoE”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (March 24, 2024): 1110–1119. Accessed May 14, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/27872.