Yu, Y. (2025) “Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP”, Proceedings of the AAAI Conference on Artificial Intelligence, 39(9), pp. 9689–9697. doi: 10.1609/aaai.v39i9.33050.