(1)
Liang, Y.; Jiao, J.; Feng, X.; Liu, X.; Liu, K.; Wang, Y.; Ye, Z.; Lu, H.; Wang, Z. IPFormer: Instance Prompt-Guided Transformer for Multi-Modal Multi-Shot Video Understanding. AAAI 2026, 40, 6907-6915.