[1]

Y. Lu, “Set Prediction Guided by Semantic Concepts for Diverse Video Captioning”, AAAI, vol. 38, no. 4, pp. 3909–3917, Mar. 2024.