[1]
Y. Lu, “Set Prediction Guided by Semantic Concepts for Diverse Video Captioning”, AAAI, vol. 38, no. 4, pp. 3909-3917, Mar. 2024.