MARS: Multimodal Adaptive Reasoning Model for Avoiding Overthinking

Tan Yue; Qiong Wu; Dongyan Zhao

doi:10.1609/aaai.v40i41.40753

Authors

Tan Yue Wangxuan Institute of Computer Technology, Peking University
Qiong Wu Wangxuan Institute of Computer Technology, Peking University
Dongyan Zhao Wangxuan Institute of Computer Technology, Peking University State Key Laboratory of General Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v40i41.40753

Abstract

Multimodal Large Language Models (MLLMs) have shown advanced performance in vision-language tasks. However, existing multimodal reasoning models often suffer from excessive reasoning steps, leading to high computational costs and inefficiency. In this paper, we propose the Multimodal Adaptive Reasoning Model (MARS), which enables adaptive adjustment of the reasoning strategy based on question difficulty. Specifically, MARS adopts a three-stage training framework based on our constructed training dataset (MART): 1) CoT Masking Learning to enhance reasoning logicality by predicting masked reasoning steps. 2) Adaptive Reasoning Instruction Learning to train the model to skip or keep reasoning steps according to difficulty levels. 3) CoT Lightweight Reinforcement Learning with the Information Bottleneck Principle based GRPO algorithm to reduce CoT length while maintaining performance and generalizability. Results on both in-domain and out-of-domain datasets show that MARS significantly reduces the CoT length (90.2% decrease) while improving accuracy (0.54%), outperforming existing SOTA open-source and proprietary MLLMs.

MARS: Multimodal Adaptive Reasoning Model for Avoiding Overthinking

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information