Ma, Yinchao, et al. “Unifying Visual and Vision-Language Tracking via Contrastive Learning”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, Mar. 2024, pp. 4107-16, doi:10.1609/aaai.v38i5.28205.