Liang, Yujia, et al. “IPFormer: Instance Prompt-Guided Transformer for Multi-Modal Multi-Shot Video Understanding”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 9, Mar. 2026, pp. 6907-15, doi:10.1609/aaai.v40i9.37624.