Li, Bingliang, Fengyu Yang, Yuxin Mao, Qingwen Ye, Hongkai Chen, and Yiran Zhong. “Tri-Ergon: Fine-Grained Video-to-Audio Generation With Multi-Modal Conditions and LUFS Control”. Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 5 (April 11, 2025): 4616–4624. Accessed May 10, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/32487.