[1]
M. Rang, Z. Bi, C. Liu, Y. Tang, K. Han, and Y. Wang, “Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts”, AAAI, vol. 39, no. 7, pp. 6694–6702, Apr. 2025.