BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering
DOI:
https://doi.org/10.1609/aaai.v40i7.37437Abstract
Numerous studies have demonstrated that Visual Question Answering (VQA) models are vulnerable to language priors and dataset biases, often leading to spurious correlations between questions and answers. As a result, these models excessively rely on linguistic cues, neglecting essential visual information and causing representational distortions. To address this challenge, we propose a novel Bayesian debiasing framework termed BayesVQA, which integrates three carefully designed mechanisms: Energy-guided Prior Variance (EPV), Energy-guided Posterior Sampling (EPS), and Energy-guided Likelihood Reweighting (ELR). Specifically, we explicitly decompose each sample's latent representation into a biased feature and a stochastic corrective perturbation δ. Using a Bayesian formulation, we model the posterior distribution of the perturbation δ conditioned on the predictive uncertainty, quantified via calibrated energy scores. To mitigate language bias, the posterior is optimized through energy-driven variational inference with an uncertainty-adaptive prior and sampling strategy. Moreover, the ELR mechanism incorporates an energy-based weighting of the reconstruction objective and enforces an energy-coherence constraint to emphasize challenging, high-uncertainty instances and align model confidence before and after debiasing. Extensive experiments conducted across multiple standard VQA benchmarks consistently validate the superior performance of our BayesVQA method over state-of-the-art competitors under distributional shifts and challenging bias conditions.Published
2026-03-14
How to Cite
Huang, Z., Zhu, H., Deng, X., Qinghao, Z., & Chen, B. (2026). BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5221–5229. https://doi.org/10.1609/aaai.v40i7.37437
Issue
Section
AAAI Technical Track on Computer Vision IV