1.
Yamazaki K, Vo K, Truong QS, Raj B, Le N. VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning. AAAI [Internet]. 2023Jun.26 [cited 2024Nov.19];37(3):3081-90. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/25412