Zhong, X., Li, Z., Chen, S., Jiang, K., Chen, C., & Ye, M. (2023). Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3724-3732. https://doi.org/10.1609/aaai.v37i3.25484