BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering

Authors

  • Zhiqi Huang Beijing Institute of Technology
  • Huanjia Zhu Beijing Institute of Technology
  • Xiangwen Deng Beijing Institute of Technology
  • Zhong Qinghao Beijing Institute of Technology
  • Bingzhi Chen Beijing Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i7.37437

Abstract

Numerous studies have demonstrated that Visual Question Answering (VQA) models are vulnerable to language priors and dataset biases, often leading to spurious correlations between questions and answers. As a result, these models excessively rely on linguistic cues, neglecting essential visual information and causing representational distortions. To address this challenge, we propose a novel Bayesian debiasing framework termed BayesVQA, which integrates three carefully designed mechanisms: Energy-guided Prior Variance (EPV), Energy-guided Posterior Sampling (EPS), and Energy-guided Likelihood Reweighting (ELR). Specifically, we explicitly decompose each sample's latent representation into a biased feature and a stochastic corrective perturbation δ. Using a Bayesian formulation, we model the posterior distribution of the perturbation δ conditioned on the predictive uncertainty, quantified via calibrated energy scores. To mitigate language bias, the posterior is optimized through energy-driven variational inference with an uncertainty-adaptive prior and sampling strategy. Moreover, the ELR mechanism incorporates an energy-based weighting of the reconstruction objective and enforces an energy-coherence constraint to emphasize challenging, high-uncertainty instances and align model confidence before and after debiasing. Extensive experiments conducted across multiple standard VQA benchmarks consistently validate the superior performance of our BayesVQA method over state-of-the-art competitors under distributional shifts and challenging bias conditions.

Downloads

Published

2026-03-14

How to Cite

Huang, Z., Zhu, H., Deng, X., Qinghao, Z., & Chen, B. (2026). BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5221–5229. https://doi.org/10.1609/aaai.v40i7.37437

Issue

Section

AAAI Technical Track on Computer Vision IV