[1]
W. Chen, J. Niu, X. Liu, Z. Wang, S. Tang, and G. Zhu, “DiffDVC: Accurate Event Detection for Dense Video Captioning via Diffusion Models”, AAAI, vol. 39, no. 2, pp. 2221–2229, Apr. 2025.