[1]
J. Chen, “EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE”, AAAI, vol. 38, no. 2, pp. 1110–1119, Mar. 2024.