[1]
L. Chen, “Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models”, AAAI, vol. 39, no. 27, pp. 28706–28706, Apr. 2025.