MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering

Authors

  • Zhifei Li School of Computer Science, Hubei University, Wuhan 430062, China Hubei Key Laboratory of Big Data Intelligent Analysis and Application (Hubei University), Wuhan 430062, China Key Laboratory of Intelligent Sensing System and Security (Hubei University), Ministry of Education, Wuhan 430062, China
  • Yiran Wang School of Computer Science, Hubei University, Wuhan 430062, China
  • Chenyi Xiong School of Computer Science, Hubei University, Wuhan 430062, China
  • Yujing Xia School of Computer Science, Hubei University, Wuhan 430062, China
  • Xiaoju Hou Institute of Vocational Education, Guangdong Industry Polytechnic University, Guangzhou 510300, China
  • Yue Zhao Shandong Police College, Ji’nan 250200, China
  • Miao Zhang School of Computer Science, Hubei University, Wuhan 430062, China
  • Kui Xiao School of Computer Science, Hubei University, Wuhan 430062, China
  • Bing Yang School of Computer Science, Hubei University, Wuhan 430062, China

DOI:

https://doi.org/10.1609/aaai.v40i38.40458

Abstract

Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has been made in retaining knowledge and adapting to new information in the VQA domain. However, current methods often struggle with balancing knowledge retention, adaptation, and robust feature representation. To address these challenges, we propose a novel framework with adaptive memory allocation and global noise filtering called MacVQA for visual question answering. MacVQA fuses visual and question information while filtering noise to ensure robust representations, and employs prototype-based memory allocation to optimize feature quality and memory usage. These designs enable MacVQA to balance knowledge acquisition, retention, and compositional generalization in continual VQA learning. Experiments on ten continual VQA tasks show that MacVQA outperforms existing baselines, achieving 43.38% average accuracy and 2.32% average forgetting on standard tasks, and 42.53% average accuracy and 3.60% average forgetting on novel composition tasks.

Downloads

Published

2026-03-14

How to Cite

Li, Z., Wang, Y., Xiong, C., Xia, Y., Hou, X., Zhao, Y., … Yang, B. (2026). MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 31888–31896. https://doi.org/10.1609/aaai.v40i38.40458

Issue

Section

AAAI Technical Track on Natural Language Processing III