[1]
Y. Ma, Y. Tang, W. Yang, T. Zhang, J. Zhang, and M. Kang, “Unifying Visual and Vision-Language Tracking via Contrastive Learning”, AAAI, vol. 38, no. 5, pp. 4107–4116, Mar. 2024.