[1]
S. Li, P. Wei, P. Qiao, C. Liu, and J. Chen, “DigitalLLaVA: Incorporating Digital Cognition Capability for Physical World Comprehension in Multimodal LLMs”, AAAI, vol. 39, no. 5, pp. 4932–4940, Apr. 2025.