Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Renjie Wu; Hu Wang; Feras Dayoub; Hsiang-Ting Chen

doi:10.1609/aaai.v38i6.28426

Authors

Renjie Wu The University of Adelaide
Hu Wang The University of Adelaide
Feras Dayoub The University of Adelaide
Hsiang-Ting Chen The University of Adelaide

DOI:

https://doi.org/10.1609/aaai.v38i6.28426

Keywords:

CV: Multi-modal Vision, CV: Segmentation, HAI: Human-Computer Interaction

Abstract

Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions leverage onboard camera arrays, these cameras often have limited field-of-view (FoV) with front or downward perspectives. Addressing this, we propose a new out-of-view semantic segmentation task and Segment Beyond View (SBV), a novel audio-visual semantic segmentation method. SBV supplements the visual modality, which miss the information beyond FoV, with the auditory information using a teacher-student distillation model (Omni2Ego). The model consists of a vision teacher utilising panoramic information, an auditory teacher with 8-channel audio, and an audio-visual student that takes views with limited FoV and binaural audio as input and produce semantic segmentation for objects outside FoV. SBV outperforms existing models in comparative evaluations and shows a consistent performance across varying FoV ranges and in monaural audio settings.

Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription