Zhou, Dongzhan, Xinchi Zhou, Di Hu, Hang Zhou, Lei Bai, Ziwei Liu, and Wanli Ouyang. “SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3544–3552. Accessed May 24, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/20266.