(1)
Zhang, Y.; Liu, Y.; Guo, Z.; Zhang, Y.; Yang, X.; Zhang, X.; Chen, C.; Song, J.; Yao, Y.; Chua, T.-S. LLaVA-UHD v2: Exploiting Hierarchical Vision Granularity in MLLMs via Inverse Semantic Pyramid. AAAI 2026, 40, 12934-12942.