BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering

Zhiqi Huang; Huanjia Zhu; Xiangwen Deng; Zhong Qinghao; Bingzhi Chen

doi:10.1609/aaai.v40i7.37437

Authors

Zhiqi Huang Beijing Institute of Technology
Huanjia Zhu Beijing Institute of Technology
Xiangwen Deng Beijing Institute of Technology
Zhong Qinghao Beijing Institute of Technology
Bingzhi Chen Beijing Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i7.37437

Abstract

Numerous studies have demonstrated that Visual Question Answering (VQA) models are vulnerable to language priors and dataset biases, often leading to spurious correlations between questions and answers. As a result, these models excessively rely on linguistic cues, neglecting essential visual information and causing representational distortions. To address this challenge, we propose a novel Bayesian debiasing framework termed BayesVQA, which integrates three carefully designed mechanisms: Energy-guided Prior Variance (EPV), Energy-guided Posterior Sampling (EPS), and Energy-guided Likelihood Reweighting (ELR). Specifically, we explicitly decompose each sample's latent representation into a biased feature and a stochastic corrective perturbation δ. Using a Bayesian formulation, we model the posterior distribution of the perturbation δ conditioned on the predictive uncertainty, quantified via calibrated energy scores. To mitigate language bias, the posterior is optimized through energy-driven variational inference with an uncertainty-adaptive prior and sampling strategy. Moreover, the ELR mechanism incorporates an energy-based weighting of the reconstruction objective and enforces an energy-coherence constraint to emphasize challenging, high-uncertainty instances and align model confidence before and after debiasing. Extensive experiments conducted across multiple standard VQA benchmarks consistently validate the superior performance of our BayesVQA method over state-of-the-art competitors under distributional shifts and challenging bias conditions.

BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information