Yan, Shilin, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, and Peng Gao. 2024. “Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation”. Proceedings of the AAAI Conference on Artificial Intelligence 38 (6):6449-57. https://doi.org/10.1609/aaai.v38i6.28465.