Multimodal Commonsense Knowledge Distillation for Visual Question Answering (Student Abstract)

Shuo Yang; Siwen Luo; Soyeon Caren Han

doi:10.1609/aaai.v39i28.35320

Multimodal Commonsense Knowledge Distillation for Visual Question Answering (Student Abstract)

Authors

Shuo Yang The University of Melbourne
Siwen Luo The University of Western Australia
Soyeon Caren Han The University of Melbourne The University of Western Australia

DOI:

https://doi.org/10.1609/aaai.v39i28.35320

Abstract

Existing Multimodal Large Language Models (MLLMs) and Visual Language Pretrained Models (VLPMs) have shown remarkable performances in general Visual Question Answering (VQA). However, these models struggle with VQA questions that require external commonsense knowledge due to the challenges in generating high-quality prompts and the high computational costs of fine-tuning. In this work, we propose a novel graph-based multimodal commonsense knowledge distillation framework that constructs a unified relational graph over commonsense knowledge, visual objects and questions through a Graph Convolutional Network (GCN) following a teacher-student environment. This proposed framework is flexible with any type of teacher and student models without further fine-tuning, and has achieved competitive performances on the ScienceQA dataset. The code is in https://github.com/adlnlp/MCKDVQA.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Yang, S., Luo, S., & Han, S. C. (2025). Multimodal Commonsense Knowledge Distillation for Visual Question Answering (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29545–29547. https://doi.org/10.1609/aaai.v39i28.35320

Download Citation

Issue

Vol. 39 No. 28: IAAI-25, EAAI-25, AAAI-25 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Multimodal Commonsense Knowledge Distillation for Visual Question Answering (Student Abstract)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information