[1]
T. Son, S. W. Seo, J. Kim, S. H. Lee, and J. W. Choi, “JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts”, AAAI, vol. 39, no. 7, pp. 6940–6949, Apr. 2025.