Chen, J., Guo, L., Sun, J., Shao, S., Yuan, Z., Lin, L., & Zhang, D. (2024). EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1110–1119. https://doi.org/10.1609/aaai.v38i2.27872