MARS: Multimodal Adaptive Reasoning Model for Avoiding Overthinking

Authors

  • Tan Yue Wangxuan Institute of Computer Technology, Peking University
  • Qiong Wu Wangxuan Institute of Computer Technology, Peking University
  • Dongyan Zhao Wangxuan Institute of Computer Technology, Peking University State Key Laboratory of General Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v40i41.40753

Abstract

Multimodal Large Language Models (MLLMs) have shown advanced performance in vision-language tasks. However, existing multimodal reasoning models often suffer from excessive reasoning steps, leading to high computational costs and inefficiency. In this paper, we propose the Multimodal Adaptive Reasoning Model (MARS), which enables adaptive adjustment of the reasoning strategy based on question difficulty. Specifically, MARS adopts a three-stage training framework based on our constructed training dataset (MART): 1) CoT Masking Learning to enhance reasoning logicality by predicting masked reasoning steps. 2) Adaptive Reasoning Instruction Learning to train the model to skip or keep reasoning steps according to difficulty levels. 3) CoT Lightweight Reinforcement Learning with the Information Bottleneck Principle based GRPO algorithm to reduce CoT length while maintaining performance and generalizability. Results on both in-domain and out-of-domain datasets show that MARS significantly reduces the CoT length (90.2% decrease) while improving accuracy (0.54%), outperforming existing SOTA open-source and proprietary MLLMs.

Downloads

Published

2026-03-14

How to Cite

Yue, T., Wu, Q., & Zhao, D. (2026). MARS: Multimodal Adaptive Reasoning Model for Avoiding Overthinking. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34539–34547. https://doi.org/10.1609/aaai.v40i41.40753

Issue

Section

AAAI Technical Track on Natural Language Processing VI