Yan, S., Zhang, R., Guo, Z., Chen, W., Zhang, W., Li, H., Qiao, Y., Dong, H., He, Z., & Gao, P. (2024). Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6449-6457. https://doi.org/10.1609/aaai.v38i6.28465