[1]
L. Zhu, “Scalable Vision-Language Understanding and Generation”, AAAI, vol. 39, no. 27, pp. 28738–28738, Apr. 2025.