YU, Yating; CAO, Congqi; ZHANG, Yueran; LV, Qinyi; MIN, Lingtong; ZHANG, Yanning. Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 39, n. 9, p. 9689–9697, 2025. DOI: 10.1609/aaai.v39i9.33050. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/33050. Acesso em: 10 may. 2026.