1.
Jia M, Meng W, Fu Z, Li Y, Zeng Q, Zhang Y, et al. Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction. AAAI [Internet]. 2026 Mar. 14 [cited 2026 May 12];40(7):5341-9. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/37450