Ma, Y., Tang, Y., Yang, W., Zhang, T., Zhang, J., & Kang, M. (2024). Unifying Visual and Vision-Language Tracking via Contrastive Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4107–4116. https://doi.org/10.1609/aaai.v38i5.28205