COCA: COllaborative CAusal Regularization for Audio-Visual Question Answering

Mingrui Lao; Nan Pu; Yu Liu; Kai He; Erwin M. Bakker; Michael S. Lew

doi:10.1609/aaai.v37i11.26527

Authors

Mingrui Lao Leiden University
Nan Pu Leiden University
Yu Liu Dalian University of Technology
Kai He Leiden University
Erwin M. Bakker Leiden University
Michael S. Lew Leiden University

DOI:

https://doi.org/10.1609/aaai.v37i11.26527

Keywords:

SNLP: Speech and Multimodality, CV: Language and Vision, CV: Multi-modal Vision

Abstract

Audio-Visual Question Answering (AVQA) is a sophisticated QA task, which aims at answering textual questions over given video-audio pairs with comprehensive multimodal reasoning. Through detailed causal-graph analyses and careful inspections of their learning processes, we reveal that AVQA models are not only prone to over-exploit prevalent language bias, but also suffer from additional joint-modal biases caused by the shortcut relations between textual-auditory/visual co-occurrences and dominated answers. In this paper, we propose a COllabrative CAusal (COCA) Regularization to remedy this more challenging issue of data biases. Specifically, a novel Bias-centered Causal Regularization (BCR) is proposed to alleviate specific shortcut biases by intervening bias-irrelevant causal effects, and further introspect the predictions of AVQA models in counterfactual and factual scenarios. Based on the fact that the dominated bias impairing model robustness for different samples tends to be different, we introduce a Multi-shortcut Collaborative Debiasing (MCD) to measure how each sample suffers from different biases, and dynamically adjust their debiasing concentration to different shortcut correlations. Extensive experiments demonstrate the effectiveness as well as backbone-agnostic ability of our COCA strategy, and it achieves state-of-the-art performance on the large-scale MUSIC-AVQA dataset.

COCA: COllaborative CAusal Regularization for Audio-Visual Question Answering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription