[1]

Yu, Y. et al. 2025. Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP. Proceedings of the AAAI Conference on Artificial Intelligence. 39, 9 (Apr. 2025), 9689–9697. DOI:https://doi.org/10.1609/aaai.v39i9.33050.