Lu, Y., Zhang, Z., Yuan, C., Li, P., Wang, Y., Li, B., & Hu, W. (2024). Set Prediction Guided by Semantic Concepts for Diverse Video Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3909–3917. https://doi.org/10.1609/aaai.v38i4.28183