SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation

Dongzhan Zhou; Xinchi Zhou; Di Hu; Hang Zhou; Lei Bai; Ziwei Liu; Wanli Ouyang

doi:10.1609/aaai.v36i3.20266

Authors

Dongzhan Zhou The University of Sydney
Xinchi Zhou The University of Sydney
Di Hu Renmin University of China
Hang Zhou Baidu Inc.
Lei Bai The University of Sydney
Ziwei Liu Nanyang Technological University
Wanli Ouyang The University of Sydney

DOI:

https://doi.org/10.1609/aaai.v36i3.20266

Keywords:

Computer Vision (CV)

Abstract

Multiple modalities can provide rich semantic information; and exploiting such information will normally lead to better performance compared with the single-modality counterpart. However, it is not easy to devise an effective cross-modal fusion structure due to the variations of feature dimensions and semantics, especially when the inputs even come from different sensors, as in the field of audio-visual learning. In this work, we propose SepFusion, a novel framework that can smoothly produce optimal fusion structures for visual-sound separation. The framework is composed of two components, namely the model generator and the evaluator. To construct the generator, we devise a lightweight architecture space that can adapt to different input modalities. In this way, we can easily obtain audio-visual fusion structures according to our demands. For the evaluator, we adopt the idea of neural architecture search to select superior networks effectively. This automatic process can significantly save human efforts while achieving competitive performances. Moreover, since our SepFusion provides a series of strong models, we can utilize the model family for broader applications, such as further promoting performance via model assembly, or providing suitable architectures for the separation of certain instrument classes. These potential applications further enhance the competitiveness of our approach.

SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription