Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing

Li Yuan; Qingfei Huang; Bingshan Zhu; Yi Cai; Qingbao Huang; Changmeng Zheng; Zikun Deng; Tao Wang

doi:10.1609/aaai.v40i33.40028

Authors

Li Yuan School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), MOE of China
Qingfei Huang School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), MOE of China
Bingshan Zhu School of Big Data and Artificial Intelligence, Guangdong University of Finance & Economics
Yi Cai School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), MOE of China
Qingbao Huang School of Electrical Engineering, Guangxi University, Nanning, Guangxi, China
Changmeng Zheng Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Zikun Deng School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), MOE of China
Tao Wang Department of Biostatistics & Health Informatics, King's College London, London, United Kingdom

DOI:

https://doi.org/10.1609/aaai.v40i33.40028

Abstract

Multimodal Knowledge Editing (MKE) extends traditional knowledge editing to settings involving both textual and visual modalities. However, existing MKE benchmarks primarily assess final answer correctness, neglecting the quality of intermediate reasoning and robustness to visually rephrased inputs. To address this limitation, we introduce MMQAKE, the first benchmark for multimodal multihop question answering with knowledge editing. MMQAKE evaluates: (1) a model’s ability to reason over 2–5-hop factual chains that span both text and images, including performance at each intermediate step; (2) robustness to visually rephrased inputs in multihop questions. Our evaluation shows that current MKE methods often struggle to consistently update and reason over multimodal reasoning chains following knowledge edits. To overcome these challenges, we propose Hybrid-DMKG, a hybrid reasoning framework built on a dynamic multimodal knowledge graph (DMKG) to enable accurate multihop reasoning over updated multimodal knowledge. Hybrid-DMKG first uses a large language model to decompose multimodal multihop questions into sequential sub-questions, then applies a multimodal retrieval model to locate updated facts by jointly encoding each sub-question with candidate entities and their associated images. For answer inference, a hybrid reasoning module operates over the DMKG via two parallel paths: (1) relation-linking prediction; (2) RAG Reasoning with large vision-language models. A background-reflective decision module then aggregates evidence from both paths to select the most credible answer. Experimental results on MMQAKE show that Hybrid-DMKG significantly outperforms existing MKE approaches, achieving higher accuracy and improved robustness to knowledge updates.

Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information