Ma, Y. (2024) “Unifying Visual and Vision-Language Tracking via Contrastive Learning”, Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), pp. 4107–4116. doi: 10.1609/aaai.v38i5.28205.