Liang, Yujia, Jile Jiao, Xuetao Feng, Xinchen Liu, Kun Liu, Yuan Wang, Zixuan Ye, Hao Lu, and Zhicheng Wang. 2026. “IPFormer: Instance Prompt-Guided Transformer for Multi-Modal Multi-Shot Video Understanding”. Proceedings of the AAAI Conference on Artificial Intelligence 40 (9):6907-15. https://doi.org/10.1609/aaai.v40i9.37624.