[1]

B. Li, F. Yang, Y. Mao, Q. Ye, H. Chen, and Y. Zhong, “Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control”, AAAI, vol. 39, no. 5, pp. 4616–4624, Apr. 2025.