KSS-MoE: Knowledge Space Synergy Framework in Mixture of Experts for Continual Visual Instruction Tuning

Lingyun Song; Ziyao Chen; Kang Pan; Xiaolin Han; Xinbiao Gan; Yudai Pan; Xiaofan Sun; Xiaoqi Wang; Xuequn Shang

doi:10.1609/aaai.v40i30.39749

Authors

Lingyun Song School of Computer Science, Northwestern Polytechnical University, Xi'an Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University, Jinhua Shenzhen Research Institute of Northwestern Polytechnical University, Shenzhen
Ziyao Chen School of Computer Science, Northwestern Polytechnical University, Xi'an
Kang Pan Independent Researcher
Xiaolin Han School of Computer Science, Northwestern Polytechnical University, Xi'an
Xinbiao Gan School of Computer Science, National University of Defense Technology, Changsha
Yudai Pan School of Computer Science, Northwestern Polytechnical University, Xi'an
Xiaofan Sun School of Computer Science, Northwestern Polytechnical University, Xi'an
Xiaoqi Wang School of Computer Science, Northwestern Polytechnical University, Xi'an
Xuequn Shang Shenzhen Research Institute of Northwestern Polytechnical University, Shenzhen Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an

DOI:

https://doi.org/10.1609/aaai.v40i30.39749

Abstract

Multimodal Large Language Models (MLLMs) employing the Mixture-of-Experts (MoE) structure exhibit encouraging results in visual language tasks. However, they struggle with catastrophic forgetting due to a lack of effective collaboration among experts and negative transfer across tasks. This happens because the router typically employed in MoE for managing expert assignments is inadequate when there are significant shifts in data distribution across various tasks. A drop in the effectiveness of earlier tasks is caused by negative transfer, which occurs due to conflicts in shared knowledge between tasks, disturbing the knowledge already acquired. To address these issues, we propose the Knowledge Space Synergy Framework in Mixture of Experts (KSS-MoE) for Continual Visual Instruction Tuning (CVIT). It dynamically combines the knowledge subspaces of experts to improve the integration of fine-grained complementary knowledge and collaborative abilities of experts, thus addressing the limitations of the basic router. Furthermore, we introduce a general expert that maintains orthogonal subspaces for shared knowledge, enabling effective cross-task knowledge utilization while reducing negative transfer. Extensive experiments conducted on eight CVIT tasks confirm the excellence of KSS-MoE, showcasing its top-tier performance.

KSS-MoE: Knowledge Space Synergy Framework in Mixture of Experts for Continual Visual Instruction Tuning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information