[1]

B. Yang, Y. Zou, F. Liu, and C. Zhang, “Non-Autoregressive Coarse-to-Fine Video Captioning”, AAAI, vol. 35, no. 4, pp. 3119–3127, May 2021.